Tarantool 2.6.0 documentation
Tarantool 2.6.0 documentation

Tarantool 2.6.0 documentation

Tarantool - Documentation


Welcome to Tarantool! This is the User’s Guide. We recommend reading it first, and consulting Reference materials for more detail afterwards, if needed.

How to read the documentation

To get started, you can install and launch Tarantool using a Docker container, a package manager, or the online Tarantool server at http://try.tarantool.org. Either way, as the first tryout, you can follow the introductory exercises from Chapter 2 “Getting started”. If you want more hands-on experience, proceed to Tutorials after you are through with Chapter 2.

Chapter 3 “Database” is about using Tarantool as a NoSQL DBMS, whereas Chapter 4 “Application server” is about using Tarantool as an application server.

Chapter 5 “Server administration” and Chapter 6 “Replication” are primarily for administrators.

Chapter 7 “Connectors” is strictly for users who are connecting from a different language such as C or Perl or Python — other users will find no immediate need for this chapter.

Chapter 8 “FAQ” gives answers to some frequently asked questions about Tarantool.

For experienced users, there are also Reference materials, a Contributor’s Guide and an extensive set of comments in the source code.

Getting in touch with the Tarantool community

Please report bugs or make feature requests at http://github.com/tarantool/tarantool/issues.

You can contact developers directly in telegram or in a Tarantool discussion group (English or Russian).

Conventions used in this manual

Square brackets [ and ] enclose optional syntax.

Two dots in a row .. mean the preceding tokens may be repeated.

A vertical bar | means the preceding and following tokens are mutually exclusive alternatives.

Getting started

In this chapter, we show how to work with Tarantool as a DBMS – and how to connect to a Tarantool database from other programming languages.

Creating your first Tarantool database

First thing, let’s install Tarantool, start it, and create a simple database.

You can install Tarantool and work with it locally or in Docker.

Using a Docker image

For trial and test purposes, we recommend using official Tarantool images for Docker. An official image contains a particular Tarantool version and all popular external modules for Tarantool. Everything is already installed and configured in Linux. These images are the easiest way to install and use Tarantool.


If you’re new to Docker, we recommend going over this tutorial before proceeding with this chapter.

Launching a container

If you don’t have Docker installed, please follow the official installation guide for your OS.

To start a fully functional Tarantool instance, run a container with minimal options:

$ docker run \
  --name mytarantool \
  -d -p 3301:3301 \
  -v /data/dir/on/host:/var/lib/tarantool \

This command runs a new container named mytarantool. Docker starts it from an official image named tarantool/tarantool:2.6.0, with Tarantool version 2.6.0 and all external modules already installed.

Tarantool will be accepting incoming connections on localhost:3301. You may start using it as a key-value storage right away.

Tarantool persists data inside the container. To make your test data available after you stop the container, this command also mounts the host’s directory /data/dir/on/host (you need to specify here an absolute path to an existing local directory) in the container’s directory /var/lib/tarantool (by convention, Tarantool in a container uses this directory to persist data). So, all changes made in the mounted directory on the container’s side are applied to the host’s disk.

Tarantool’s database module in the container is already configured and started. You needn’t do it manually, unless you use Tarantool as an application server and run it with an application.


If your container terminates soon after start, follow this page for a possible solution.

Attaching to Tarantool

To attach to Tarantool that runs inside the container, say:

$ docker exec -i -t mytarantool console

This command:

  • Instructs Tarantool to open an interactive console port for incoming connections.
  • Attaches to the Tarantool server inside the container under admin user via a standard Unix socket.

Tarantool displays a prompt:


Now you can enter requests on the command line.


On production machines, Tarantool’s interactive mode is for system administration only. But we use it for most examples in this manual, because the interactive mode is convenient for learning.

Creating a database

While you’re attached to the console, let’s create a simple test database.

First, create the first space (named tester):

tarantool.sock> s = box.schema.space.create('tester')

Format the created space by specifying field names and types:

tarantool.sock> s:format({
              > {name = 'id', type = 'unsigned'},
              > {name = 'band_name', type = 'string'},
              > {name = 'year', type = 'unsigned'}
              > })

Create the first index (named primary):

tarantool.sock> s:create_index('primary', {
              > type = 'hash',
              > parts = {'id'}
              > })

This is a primary index based on the id field of each tuple.

Insert three tuples (our name for records) into the space:

tarantool.sock> s:insert{1, 'Roxette', 1986}
tarantool.sock> s:insert{2, 'Scorpions', 2015}
tarantool.sock> s:insert{3, 'Ace of Base', 1993}

To select a tuple using the primary index, say:

tarantool.sock> s:select{3}

The terminal screen now looks like this:

tarantool.sock> s = box.schema.space.create('tester')
tarantool.sock> s:format({
              > {name = 'id', type = 'unsigned'},
              > {name = 'band_name', type = 'string'},
              > {name = 'year', type = 'unsigned'}
              > })
tarantool.sock> s:create_index('primary', {
              > type = 'hash',
              > parts = {'id'}
              > })
- unique: true
  - type: unsigned
    is_nullable: false
    fieldno: 1
  id: 0
  space_id: 512
  name: primary
  type: HASH
tarantool.sock> s:insert{1, 'Roxette', 1986}
- [1, 'Roxette', 1986]
tarantool.sock> s:insert{2, 'Scorpions', 2015}
- [2, 'Scorpions', 2015]
tarantool.sock> s:insert{3, 'Ace of Base', 1993}
- [3, 'Ace of Base', 1993]
tarantool.sock> s:select{3}
- - [3, 'Ace of Base', 1993]

To add a secondary index based on the band_name field, say:

tarantool.sock> s:create_index('secondary', {
              > type = 'hash',
              > parts = {'band_name'}
              > })

To select tuples using the secondary index, say:

tarantool.sock> s.index.secondary:select{'Scorpions'}
- - [2, 'Scorpions', 2015]

To drop an index, say:

tarantool> s.index.secondary:drop()

Stopping a container

When the testing is over, stop the container politely:

$ docker stop mytarantool

This was a temporary container, and its disk/memory data were flushed when you stopped it. But since you mounted a data directory from the host in the container, Tarantool’s data files were persisted to the host’s disk. Now if you start a new container and mount that data directory in it, Tarantool will recover all data from disk and continue working with the persisted data.

Using a package manager

For production purposes, we recommend to install Tarantool via official package manager. You can choose one of three versions: LTS, stable, or beta. An automatic build system creates, tests and publishes packages for every push into a corresponding branch at Tarantool’s GitHub repository.

To download and install the package that’s appropriate for your OS, start a shell (terminal) and enter the command-line instructions provided for your OS at Tarantool’s download page.

Starting Tarantool

To start working with Tarantool, run a terminal and say this:

$ tarantool
$ # by doing this, you create a new Tarantool instance

Tarantool starts in the interactive mode and displays a prompt:


Now you can enter requests on the command line.


On production machines, Tarantool’s interactive mode is for system administration only. But we use it for most examples in this manual, because the interactive mode is convenient for learning.

Creating a database

Here is how to create a simple test database after installation.

  1. To let Tarantool store data in a separate place, create a new directory dedicated for tests:

    $ mkdir ~/tarantool_sandbox
    $ cd ~/tarantool_sandbox

    You can delete the directory when the tests are over.

  2. Check if the default port the database instance will listen to is vacant.

    In versions before 2.4.2, during installation the Tarantool packages for Debian and Ubuntu automatically enable and start the demonstrative global example.lua instance that listens to the 3301 port by default. The example.lua file showcases the basic configuration and can be found in the /etc/tarantool/instances.enabled or /etc/tarantool/instances.available directories.

    However, we encourage you to perform the instance startup manually, so you can learn.

    Make sure the default port is vacant:

    1. To check if the demonstrative instance is running, say:

      $ lsof -i :3301
      tarantool 6851 root   12u  IPv4  40827      0t0  TCP *:3301 (LISTEN)
    2. If it does, kill the corresponding process. In this example:

      $ kill 6851
  3. To start Tarantool’s database module and make the instance accept TCP requests on port 3301, say:

    tarantool> box.cfg{listen = 3301}
  4. Create the first space (named tester):

    tarantool> s = box.schema.space.create('tester')
  5. Format the created space by specifying field names and types:

    tarantool> s:format({
             > {name = 'id', type = 'unsigned'},
             > {name = 'band_name', type = 'string'},
             > {name = 'year', type = 'unsigned'}
             > })
  6. Create the first index (named primary):

    tarantool> s:create_index('primary', {
             > type = 'hash',
             > parts = {'id'}
             > })

    This is a primary index based on the id field of each tuple.

  7. Insert three tuples (our name for records) into the space:

    tarantool> s:insert{1, 'Roxette', 1986}
    tarantool> s:insert{2, 'Scorpions', 2015}
    tarantool> s:insert{3, 'Ace of Base', 1993}
  8. To select a tuple using the primary index, say:

    tarantool> s:select{3}

    The terminal screen now looks like this:

    tarantool> s = box.schema.space.create('tester')
    tarantool> s:format({
             > {name = 'id', type = 'unsigned'},
             > {name = 'band_name', type = 'string'},
             > {name = 'year', type = 'unsigned'}
             > })
    tarantool> s:create_index('primary', {
             > type = 'hash',
             > parts = {'id'}
             > })
    - unique: true
      - type: unsigned
        is_nullable: false
        fieldno: 1
      id: 0
      space_id: 512
      name: primary
      type: HASH
    tarantool> s:insert{1, 'Roxette', 1986}
    - [1, 'Roxette', 1986]
    tarantool> s:insert{2, 'Scorpions', 2015}
    - [2, 'Scorpions', 2015]
    tarantool> s:insert{3, 'Ace of Base', 1993}
    - [3, 'Ace of Base', 1993]
    tarantool> s:select{3}
    - - [3, 'Ace of Base', 1993]
  9. To add a secondary index based on the band_name field, say:

    tarantool> s:create_index('secondary', {
             > type = 'hash',
             > parts = {'band_name'}
             > })
  10. To select tuples using the secondary index, say:

    tarantool> s.index.secondary:select{'Scorpions'}
    - - [2, 'Scorpions', 2015]
  11. Now, to prepare for the example in the next section, try this:

    tarantool> box.schema.user.grant('guest', 'read,write,execute', 'universe')

Connecting remotely

In the request box.cfg{listen = 3301} that we made earlier, the listen value can be any form of a URI (uniform resource identifier). In this case, it’s just a local port: port 3301. You can send requests to the listen URI via:

  1. telnet,
  2. a connector,
  3. another instance of Tarantool (using the console module), or
  4. tarantoolctl administrative utility.

Let’s try (3).

Switch to another terminal. On Linux, for example, this means starting another instance of a Bash shell. You can switch to any working directory in the new terminal, not necessarily to ~/tarantool_sandbox.

Start another instance of tarantool:

$ tarantool

Use net.box to connect to the Tarantool instance that’s listening on localhost:3301”:

tarantool> net_box = require('net.box')
tarantool> conn = net_box.connect(3301)

Try this request:

tarantool> conn.space.tester:select{2}

This means “send a request to that Tarantool instance, and display the result”. It is equivalent to the local request box.space.tester:select{2}. The result in this case is one of the tuples that was inserted earlier. Your terminal screen should now look like this:

$ tarantool

Tarantool 2.6.1-32-g53dbba7c2
type 'help' for interactive help
tarantool> net_box = require('net.box')
tarantool> conn = net_box.connect(3301)
tarantool> conn.space.tester:select{2}
- - [2, 'Scorpions', 2015]

You can repeat box.space...:insert{} and box.space...:select{} (or conn.space...:insert{} and conn.space...:select{}) indefinitely, on either Tarantool instance.

When the testing is over:

  • To drop the space: s:drop()
  • To stop tarantool: Ctrl+C or Ctrl+D
  • To stop Tarantool (an alternative): the standard Lua function os.exit()
  • To stop Tarantool (from another terminal): sudo pkill -f tarantool
  • To destroy the test: rm -r ~/tarantool_sandbox

Connecting from your favorite language

In the previous sections, you have learned how to create a Tarantool database. Now let’s see how to connect to the database from different programming languages, such as Python, PHP, Go, and C++, and execute typical requests for manipulating the data (select, insert, delete, and so on).

Connecting from Python


Before we proceed:

  1. Install the tarantool module. We recommend using python3 and pip3.

  2. Start Tarantool (locally or in Docker) and make sure that you have created and populated a database as we suggested earlier:

    box.cfg{listen = 3301}
    s = box.schema.space.create('tester')
             {name = 'id', type = 'unsigned'},
             {name = 'band_name', type = 'string'},
             {name = 'year', type = 'unsigned'}
    s:create_index('primary', {
             type = 'hash',
             parts = {'id'}
    s:create_index('secondary', {
             type = 'hash',
             parts = {'band_name'}
    s:insert{1, 'Roxette', 1986}
    s:insert{2, 'Scorpions', 2015}
    s:insert{3, 'Ace of Base', 1993}


    Please do not close the terminal window where Tarantool is running – you’ll need it soon.

  3. In order to connect to Tarantool as an administrator, reset the password for the admin user:


Connecting to Tarantool

To get connected to the Tarantool server, say this:

>>> import tarantool
>>> connection = tarantool.connect("localhost", 3301)

You can also specify the user name and password, if needed:

>>> tarantool.connect("localhost", 3301, user=username, password=password)

The default user is guest.

Manipulating the data

A space is a container for tuples. To access a space as a named object, use connection.space:

>>> tester = connection.space('tester')
Inserting data

To insert a tuple into a space, use insert:

>>> tester.insert((4, 'ABBA', 1972))
[4, 'ABBA', 1972]
Querying data

Let’s start with selecting a tuple by the primary key (in our example, this is the index named primary, based on the id field of each tuple). Use select:

>>> tester.select(4)
[4, 'ABBA', 1972]

Next, select tuples by a secondary key. For this purpose, you need to specify the number or name of the index.

First off, select tuples using the index number:

>>> tester.select('Scorpions', index=1)
[2, 'Scorpions', 2015]

(We say index=1 because index numbers in Tarantool start with 0, and we’re using our second index here.)

Now make a similar query by the index name and make sure that the result is the same:

>>> tester.select('Scorpions', index='secondary')
[2, 'Scorpions', 2015]

Finally, select all the tuples in a space via a select with no arguments:

>>> tester.select()
Updating data

Update a field value using update:

>>> tester.update(4, [('=', 1, 'New group'), ('+', 2, 2)])

This updates the value of field 1 and increases the value of field 2 in the tuple with id = 4. If a tuple with this id doesn’t exist, Tarantool will return an error.

Now use replace to totally replace the tuple that matches the primary key. If a tuple with this primary key doesn’t exist, Tarantool will do nothing.

>>> tester.replace((4, 'New band', 2015))

You can also update the data using upsert that works similarly to update, but creates a new tuple if the old one was not found.

>>> tester.upsert((4, 'Another band', 2000), [('+', 2, 5)])

This increases by 5 the value of field 2 in the tuple with id = 4, or inserts the tuple (4, "Another band", 2000) if a tuple with this id doesn’t exist.

Deleting data

To delete a tuple, use delete(primary_key):

>>> tester.delete(4)
[4, 'New group', 2012]

To delete all tuples in a space (or to delete an entire space), use call. We’ll focus on this function in more detail in the next section.

To delete all tuples in a space, call space:truncate:

>>> connection.call('box.space.tester:truncate', ())

To delete an entire space, call space:drop. This requires connecting to Tarantool as the admin user:

>>> connection.call('box.space.tester:drop', ())

Executing stored procedures

Switch to the terminal window where Tarantool is running.


If you don’t have a terminal window with remote connection to Tarantool, check out these guides:

Define a simple Lua function:

function sum(a, b)
    return a + b

Now we have a Lua function defined in Tarantool. To invoke this function from python, use call:

>>> connection.call('sum', (3, 2))

To send bare Lua code for execution, use eval:

>>> connection.eval('return 4 + 5')

Connecting from PHP


Before we proceed:

  1. Install the tarantool/client library.

  2. Start Tarantool (locally or in Docker) and make sure that you have created and populated a database as we suggested earlier:

    box.cfg{listen = 3301}
    s = box.schema.space.create('tester')
             {name = 'id', type = 'unsigned'},
             {name = 'band_name', type = 'string'},
             {name = 'year', type = 'unsigned'}
    s:create_index('primary', {
             type = 'hash',
             parts = {'id'}
    s:create_index('secondary', {
             type = 'hash',
             parts = {'band_name'}
    s:insert{1, 'Roxette', 1986}
    s:insert{2, 'Scorpions', 2015}
    s:insert{3, 'Ace of Base', 1993}


    Please do not close the terminal window where Tarantool is running – you’ll need it soon.

  3. In order to connect to Tarantool as an administrator, reset the password for the admin user:


Connecting to Tarantool

To configure a connection to the Tarantool server, say this:

use Tarantool\Client\Client;

require __DIR__.'/vendor/autoload.php';
$client = Client::fromDefaults();

The connection itself will be established at the first request. You can also specify the user name and password, if needed:

$client = Client::fromOptions([
    'uri' => 'tcp://',
    'username' => '<username>',
    'password' => '<password>'

The default user is guest.

Manipulating the data

A space is a container for tuples. To access a space as a named object, use getSpace:

$tester = $client->getSpace('tester');
Inserting data

To insert a tuple into a space, use insert:

$result = $tester->insert([4, 'ABBA', 1972]);
Querying data

Let’s start with selecting a tuple by the primary key (in our example, this is the index named primary, based on the id field of each tuple). Use select:

use Tarantool\Client\Schema\Criteria;

$result = $tester->select(Criteria::key([4]));
[[4, 'ABBA', 1972]]

Next, select tuples by a secondary key. For this purpose, you need to specify the number or name of the index.

First off, select tuples using the index number:

$result = $tester->select(Criteria::index(1)->andKey(['Scorpions']));
[2, 'Scorpions', 2015]

(We say index(1) because index numbers in Tarantool start with 0, and we’re using our second index here.)

Now make a similar query by the index name and make sure that the result is the same:

$result = $tester->select(Criteria::index('secondary')->andKey(['Scorpions']));
[2, 'Scorpions', 2015]

Finally, select all the tuples in a space via a select:

$result = $tester->select(Criteria::allIterator());
Updating data

Update a field value using update:

use Tarantool\Client\Schema\Operations;

$result = $tester->update([4], Operations::set(1, 'New group')->andAdd(2, 2));

This updates the value of field 1 and increases the value of field 2 in the tuple with id = 4. If a tuple with this id doesn’t exist, Tarantool will return an error.

Now use replace to totally replace the tuple that matches the primary key. If a tuple with this primary key doesn’t exist, Tarantool will do nothing.

$result = $tester->replace([4, 'New band', 2015]);

You can also update the data using upsert that works similarly to update, but creates a new tuple if the old one was not found.

use Tarantool\Client\Schema\Operations;

$tester->upsert([4, 'Another band', 2000], Operations::add(2, 5));

This increases by 5 the value of field 2 in the tuple with id = 4, or inserts the tuple (4, "Another band", 2000) if a tuple with this id doesn’t exist.

Deleting data

To delete a tuple, use delete(primary_key):

$result = $tester->delete([4]);

To delete all tuples in a space (or to delete an entire space), use call. We’ll focus on this function in more detail in the next section.

To delete all tuples in a space, call space:truncate:

$result = $client->call('box.space.tester:truncate');

To delete an entire space, call space:drop. This requires connecting to Tarantool as the admin user:

$result = $client->call('box.space.tester:drop');

Executing stored procedures

Switch to the terminal window where Tarantool is running.


If you don’t have a terminal window with remote connection to Tarantool, check out these guides:

Define a simple Lua function:

function sum(a, b)
    return a + b

Now we have a Lua function defined in Tarantool. To invoke this function from php, use call:

$result = $client->call('sum', 3, 2);

To send bare Lua code for execution, use eval:

$result = $client->evaluate('return 4 + 5');

Connecting from Go


Before we proceed:

  1. Install the go-tarantool library.

  2. Start Tarantool (locally or in Docker) and make sure that you have created and populated a database as we suggested earlier:

    box.cfg{listen = 3301}
    s = box.schema.space.create('tester')
             {name = 'id', type = 'unsigned'},
             {name = 'band_name', type = 'string'},
             {name = 'year', type = 'unsigned'}
    s:create_index('primary', {
             type = 'hash',
             parts = {'id'}
    s:create_index('secondary', {
             type = 'hash',
             parts = {'band_name'}
    s:insert{1, 'Roxette', 1986}
    s:insert{2, 'Scorpions', 2015}
    s:insert{3, 'Ace of Base', 1993}


    Please do not close the terminal window where Tarantool is running – you’ll need it soon.

  3. In order to connect to Tarantool as an administrator, reset the password for the admin user:


Connecting to Tarantool

To get connected to the Tarantool server, write a simple Go program:

package main

import (


func main() {

    conn, err := tarantool.Connect("", tarantool.Opts{
            User: "admin",
            Pass: "pass",

    if err != nil {
            log.Fatalf("Connection refused")

    defer conn.Close()

    // Your logic for interacting with the database

The default user is guest.

Manipulating the data

Inserting data

To insert a tuple into a space, use Insert:

resp, err = conn.Insert("tester", []interface{}{4, "ABBA", 1972})

This inserts the tuple (4, "ABBA", 1972) into a space named tester.

The response code and data are available in the tarantool.Response structure:

code := resp.Code
data := resp.Data
Querying data

To select a tuple from a space, use Select:

resp, err = conn.Select("tester", "primary", 0, 1, tarantool.IterEq, []interface{}{4})

This selects a tuple by the primary key with offset = 0 and limit = 1 from a space named tester (in our example, this is the index named primary, based on the id field of each tuple).

Next, select tuples by a secondary key.

resp, err = conn.Select("tester", "secondary", 0, 1, tarantool.IterEq, []interface{}{"ABBA"})

Finally, it would be nice to select all the tuples in a space. But there is no one-liner for this in Go; you would need a script like this one.

For more examples, see https://github.com/tarantool/go-tarantool#usage

Updating data

Update a field value using Update:

resp, err = conn.Update("tester", "primary", []interface{}{4}, []interface{}{[]interface{}{"+", 2, 3}})

This increases by 3 the value of field 2 in the tuple with id = 4. If a tuple with this id doesn’t exist, Tarantool will return an error.

Now use Replace to totally replace the tuple that matches the primary key. If a tuple with this primary key doesn’t exist, Tarantool will do nothing.

resp, err = conn.Replace("tester", []interface{}{4, "New band", 2011})

You can also update the data using Upsert that works similarly to Update, but creates a new tuple if the old one was not found.

resp, err = conn.Upsert("tester", []interface{}{4, "Another band", 2000}, []interface{}{[]interface{}{"+", 2, 5}})

This increases by 5 the value of the third field in the tuple with id = 4, or inserts the tuple (4, "Another band", 2000) if a tuple with this id doesn’t exist.

Deleting data

To delete a tuple, use сonnection.Delete:

resp, err = conn.Delete("tester", "primary", []interface{}{4})

To delete all tuples in a space (or to delete an entire space), use Call. We’ll focus on this function in more detail in the next section.

To delete all tuples in a space, call space:truncate:

resp, err = conn.Call("box.space.tester:truncate", []interface{}{})

To delete an entire space, call space:drop. This requires connecting to Tarantool as the admin user:

resp, err = conn.Call("box.space.tester:drop", []interface{}{})

Executing stored procedures

Switch to the terminal window where Tarantool is running.


If you don’t have a terminal window with remote connection to Tarantool, check out these guides:

Define a simple Lua function:

function sum(a, b)
    return a + b

Now we have a Lua function defined in Tarantool. To invoke this function from go, use Call:

resp, err = conn.Call("sum", []interface{}{2, 3})

To send bare Lua code for execution, use Eval:

resp, err = connection.Eval("return 4 + 5", []interface{}{})

Connecting to Tarantool from C++

To simplify the start of your working with the Tarantool C++ connector, we will use the example application from the connector repository. We will go step by step through the application code and explain what each part does.

The following main topics are discussed in this manual:


To go through this Getting Started exercise, you need the following pre-requisites to be done:


The Tarantool C++ connector is currently supported for Linux only.

The connector itself is a header-only library, so, it doesn’t require installation and building as such. All you need is to clone the connector source code and embed it in your C++ project.

Also, make sure you have other necessary software and Tarantool installed.

  1. Make sure you have the following third-party software. If you miss some of the items, install them:

  2. If you don’t have Tarantool on your OS, install it in one of the ways:

  3. Clone the Tarantool C++ connector repository.

    git clone git@github.com:tarantool/tntcxx.git
Starting Tarantool and creating a database

Start Tarantool locally or in Docker and create a space with the following schema and index:

box.cfg{listen = 3301}
t = box.schema.space.create('t')
         {name = 'id', type = 'unsigned'},
         {name = 'a', type = 'string'},
         {name = 'b', type = 'number'}
t:create_index('primary', {
         type = 'hash',
         parts = {'id'}


Do not close the terminal window where Tarantool is running. You will need it later to connect to Tarantool from your C++ application.

Setting up access rights

To be able to execute the necessary operations in Tarantool, you need to grant the guest user with the read-write rights. The simplest way is to grant the user with the super role:

box.schema.user.grant('guest', 'super')

Connecting to Tarantool

There are three main parts of the C++ connector: the IO-zero-copy buffer, the msgpack encoder/decoder, and the client that handles requests.

To set up connection to a Tarantool instance from a C++ application, you need to do the following:

Embedding connector

Embed the connector in your C++ application by including the main header:

#include "../src/Client/Connector.hpp"
Instantiating objects

First, we should create a connector client. It can handle many connections to Tarantool instances asynchronously. To instantiate a client, you should specify the buffer and the network provider implementations as template parameters. The connector’s main class has the following signature:

template<class BUFFER, class NetProvider = DefaultNetProvider<BUFFER>>
class Connector;

The buffer is parametrized by allocator. It means that users can choose which allocator will be used to provide memory for the buffer’s blocks. Data is organized into a linked list of blocks of fixed size that is specified as the template parameter of the buffer.

You can either implement your own buffer or network provider or use the default ones as we do in our example. So, the default connector instantiation looks as follows:

using Buf_t = tnt::Buffer<16 * 1024>;
using Net_t = DefaultNetProvider<Buf_t >;
Connector<Buf_t, Net_t> client;

To use the BUFFER class, the buffer header should also be included:

#include "../src/Buffer/Buffer.hpp"

A client itself is not enough to work with Tarantool instances–we also need to create connection objects. A connection also takes the buffer and the network provider as template parameters. Note that they must be the same as ones of the client:

Connection<Buf_t, Net_t> conn(client);

Our Tarantool instance is listening to the 3301 port on localhost. Let’s define the corresponding variables as well as the WAIT_TIMEOUT variable for connection timeout.

const char *address = "";
int port = 3301;
int WAIT_TIMEOUT = 1000; //milliseconds

To connect to the Tarantool instance, we should invoke the Connector::connect() method of the client object and pass three arguments: connection instance, address, and port.

int rc = client.connect(conn, address, port);
Error handling

Implementation of the connector is exception free, so we rely on the return codes: in case of fail, the connect() method returns rc < 0. To get the error message corresponding to the last error occured during communication with the instance, we can invoke the Connection::getError() method.

if (rc != 0) {
	std::cerr << conn.getError() << std::endl;
	return -1;

To reset connection after errors, that is, to clean up the error message and connection status, the Connection::reset() method is used.

Working with requests

In this section, we will show how to:

We will also go through the case of having several connections and executing a number of requests from different connections simultaneously.

In our example C++ application, we execute the following types of requests:

  • ping
  • replace
  • select.


Examples on other request types, namely, insert, delete, upsert, and update, will be added to this manual later.

Each request method returns a request ID that is a sort of future. This ID can be used to get the response message when it is ready. Requests are queued in the output buffer of connection until the Connector::wait() method is called.

Preparing requests

At this step, requests are encoded in the MessagePack format and saved in the output connection buffer. They are ready to be sent but the network communication itself will be done later.

Let’s remind that for the requests manipulating with data we are dealing with the Tarantool space t created earlier, and the space has the following format:

         {name = 'id', type = 'unsigned'},
         {name = 'a', type = 'string'},
         {name = 'b', type = 'number'}


rid_t ping = conn.ping();


Equals to Lua request <space_name>:replace(pk_value, "111", 1).

uint32_t space_id = 512;
int pk_value = 666;
std::tuple data = std::make_tuple(pk_value /* field 1*/, "111" /* field 2*/, 1.01 /* field 3*/);
rid_t replace = conn.space[space_id].replace(data);


Equals to Lua request <space_name>.index[0]:select({pk_value}, {limit = 1}).

uint32_t index_id = 0;
uint32_t limit = 1;
uint32_t offset = 0;
IteratorType iter = IteratorType::EQ;
auto i = conn.space[space_id].index[index_id];
rid_t select = i.select(std::make_tuple(pk_value), limit, offset, iter);
Sending requests

To send requests to the server side, invoke the client.wait() method.

client.wait(conn, ping, WAIT_TIMEOUT);

The wait() method takes the connection to poll, the request ID, and, optionally, the timeout as parameters. Once a response for the specified request is ready, wait() terminates. It also provides a negative return code in case of system related fails, for example, a broken or timeouted connection. If wait() returns 0, then a response has been received and expected to be parsed.

Now let’s send our requests to the Tarantool instance. The futureIsReady() function checks availability of a future and returns true or false.

while (! conn.futureIsReady(ping)) {
	 * wait() is the main function responsible for sending/receiving
	 * requests and implements event-loop under the hood. It may
	 * fail due to several reasons:
	 *  - connection is timed out;
	 *  - connection is broken (e.g. closed);
	 *  - epoll is failed.
	if (client.wait(conn, ping, WAIT_TIMEOUT) != 0) {
		std::cerr << conn.getError() << std::endl;
Receiving responses

To get the response when it is ready, use the Connection::getResponse() method. It takes the request ID and returns an optional object containing the response. If the response is not ready yet, the method returns std::nullopt. Note that on each future, getResponse() can be called only once: it erases the request ID from the internal map once it is returned to a user.

A response consists of a header and a body (response.header and response.body). Depending on success of the request execution on the server side, body may contain either runtime error(s) accessible by response.body.error_stack or data (tuples)–response.body.data. In turn, data is a vector of tuples. However, tuples are not decoded and come in the form of pointers to the start and the end of msgpacks. See the “Decoding and reading the data” section to understand how to decode tuples.

There are two options for single connection it regards to receiving responses: we can either wait for one specific future or for all of them at once. We’ll try both options in our example. For the ping request, let’s use the first option.

std::optional<Response<Buf_t>> response = conn.getResponse(ping);
 * Since conn.futureIsReady(ping) returned <true>, then response
 * must be ready.
assert(response != std::nullopt);
 * If request is successfully executed on server side, response
 * will contain data (i.e. tuple being replaced in case of :replace()
 * request or tuples satisfying search conditions in case of :select();
 * responses for pings contain nothing - empty map).
 * To tell responses containing data from error responses, one can
 * rely on response code storing in the header or check
 * Response->body.data and Response->body.error_stack members.
printResponse<Buf_t>(conn, *response);

For the replace and select requests, let’s examine the option of waiting for both futures at once.

/* Let's wait for both futures at once. */
rid_t futures[2];
futures[0] = replace;
futures[1] = select;
/* No specified timeout means that we poll futures until they are ready.*/
client.waitAll(conn, (rid_t *) &futures, 2);
for (int i = 0; i < 2; ++i) {
	response = conn.getResponse(futures[i]);
	assert(response != std::nullopt);
	printResponse<Buf_t>(conn, *response);
Several connections at once

Now, let’s have a look at the case when we establish two connections to Tarantool instance simultaneously.

/* Let's create another connection. */
Connection<Buf_t, Net_t> another(client);
if (client.connect(another, address, port) != 0) {
	std::cerr << conn.getError() << std::endl;
	return -1;
/* Simultaneously execute two requests from different connections. */
rid_t f1 = conn.ping();
rid_t f2 = another.ping();
 * waitAny() returns the first connection received response.
 * All connections registered via :connect() call are participating.
Connection<Buf_t, Net_t> *first = client.waitAny(WAIT_TIMEOUT);
if (first == &conn) {
} else {
Closing connections

Finally, a user is responsible for closing connections.


Building and launching C++ application

Now, we are going to build our example C++ application, launch it to connect to the Tarantool instance and execute all the requests defined.

Make sure you are in the root directory of the cloned C++ connector repository. To build the example application:

cd examples
cmake .

Make sure the Tarantool session you started earlier is running. Launch the application:


As you can see from the execution log, all the connections to Tarantool defined in our application have been established and all the requests have been executed successfully.

Decoding and reading the data

Responses from a Tarantool instance contain raw data, that is, the data encoded into the MessagePack tuples. To decode client’s data, the user has to write their own decoders (readers) based on the database schema and include them in one’s application:

#include "Reader.hpp"

To show the logic of decoding a response, we will use the reader from our example.

First, the structure corresponding our example space format is defined:

 * Corresponds to tuples stored in user's space:
struct UserTuple {
	uint64_t field1;
	std::string field2;
	double field3;
Base reader prototype

Prototype of the base reader is given in src/mpp/Dec.hpp:

template <class BUFFER, Type TYPE>
struct SimpleReaderBase : DefaultErrorHandler {
   using BufferIterator_t = typename BUFFER::iterator;
   /* Allowed type of values to be parsed. */
   static constexpr Type VALID_TYPES = TYPE;
   BufferIterator_t* StoreEndIterator() { return nullptr; }

Every new reader should inherit from it or directly from the DefaultErrorHandler.

Parsing values

To parse a particular value, we should define the Value() method. First two arguments of the method are common and unused as a rule, but the third one defines the parsed value. In case of POD (Plain Old Data) structures, it’s enough to provide a byte-to-byte copy. Since there are fields of three different types in our schema, let’s define the corresponding Value() functions:

struct UserTupleValueReader : mpp::DefaultErrorHandler {
	explicit UserTupleValueReader(UserTuple& t) : tuple(t) {}
	static constexpr mpp::Type VALID_TYPES = mpp::MP_UINT | mpp::MP_STR | mpp::MP_DBL;
	template <class T>
	void Value(const BufIter_t&, mpp::compact::Type, T v)
		using A = UserTuple;
		static constexpr std::tuple map(&A::field1, &A::field3);
		auto ptr = std::get<std::decay_t<T> A::*>(map);
		tuple.*ptr = v;
	void Value(const BufIter_t& itr, mpp::compact::Type, mpp::StrValue v)
		BufIter_t tmp = itr;
		tmp += v.offset;
		std::string &dst = tuple.field2;
		while (v.size) {
	void WrongType(mpp::Type expected, mpp::Type got)
		std::cout << "expected type is " << expected <<
			     " but got " << got << std::endl;

	BufIter_t* StoreEndIterator() { return nullptr; }
	UserTuple& tuple;
Parsing array

It’s also important to understand that a tuple itself is wrapped in an array, so, in fact, we should parse the array first. Let’s define another reader for that purpose.

template <class BUFFER>
struct UserTupleReader : mpp::SimpleReaderBase<BUFFER, mpp::MP_ARR> {

	UserTupleReader(mpp::Dec<BUFFER>& d, UserTuple& t) : dec(d), tuple(t) {}

	void Value(const iterator_t<BUFFER>&, mpp::compact::Type, mpp::ArrValue u)
		assert(u.size == 3);
		dec.SetReader(false, UserTupleValueReader{tuple});
	mpp::Dec<BUFFER>& dec;
	UserTuple& tuple;
Setting reader

The SetReader() method sets the reader that is invoked while each of the array’s entries is parsed. To make two readers defined above work, we should create a decoder, set its iterator to the position of the encoded tuple, and invoke the Read() method (the code block below is from the example application).

template <class BUFFER>
decodeUserTuple(BUFFER &buf, Data<BUFFER> &data)
	std::vector<UserTuple> results;
	for(auto const& t: data.tuples) {
		assert(t.begin != std::nullopt);
		assert(t.end != std::nullopt);
		UserTuple tuple;
		mpp::Dec dec(buf);
		dec.SetReader(false, UserTupleReader<BUFFER>{dec, tuple});
		mpp::ReadResult_t res = dec.Read();
		assert(res == mpp::READ_SUCCESS);
	return results;

Creating your first Tarantool Cartridge application

Here we’ll walk you through developing a simple cluster application.

First, set up the development environment.

Next, create an application named myapp. Say:

$ cartridge create --name myapp

This will create a Tarantool Cartridge application in the ./myapp directory, with a handful of template files and directories inside.

Go inside and make a dry run:

$ cd ./myapp
$ cartridge build
$ cartridge start

This will build the application locally, start 5 instances of Tarantool, and run the application as it is, with no business logic yet.

Why 5 instances? See the instances.yml file in your application directory. It contains the configuration of all instances that you can use in the cluster. By default, it defines configuration for 5 Tarantool instances.

  workdir: ./tmp/db_dev/3301
  advertise_uri: localhost:3301
  http_port: 8081

  workdir: ./tmp/db_dev/3302
  advertise_uri: localhost:3302
  http_port: 8082

  workdir: ./tmp/db_dev/3303
  advertise_uri: localhost:3303
  http_port: 8083

  workdir: ./tmp/db_dev/3304
  advertise_uri: localhost:3304
  http_port: 8084

  workdir: ./tmp/db_dev/3305
  advertise_uri: localhost:3305
  http_port: 8085

You can already see these instances in the cluster management web interface at http://localhost:8081 (here 8081 is the HTTP port of the first instance specified in instances.yml).


Okay, press Ctrl + C to stop the cluster for a while.

Now it’s time to add some business logic to your application. This will be an evergreen “Hello world!”” – just to keep things simple.

Rename the template file app/roles/custom.lua to hello-world.lua.

$ mv app/roles/custom.lua app/roles/hello-world.lua

This will be your role. In Tarantool Cartridge, a role is a Lua module that implements some instance-specific functions and/or logic. Further on we’ll show how to add code to a role, build it, enable and test.

There is already some code in the role’s init() function.

 local function init(opts) -- luacheck: no unused args
     -- if opts.is_master then
     -- end

     local httpd = cartridge.service_get('httpd')
     httpd:route({method = 'GET', path = '/hello'}, function()
         return {body = 'Hello world!'}

     return true

This exports an HTTP endpoint /hello. For example, http://localhost:8081/hello if you address the first instance from the instances.yml file. If you open it in a browser after enabling the role (we’ll do it here a bit later), you’ll see “Hello world!” on the page.

Let’s add some more code there.

 local function init(opts) -- luacheck: no unused args
     -- if opts.is_master then
     -- end

     local httpd = cartridge.service_get('httpd')
     httpd:route({method = 'GET', path = '/hello'}, function()
         return {body = 'Hello world!'}

     local log = require('log')
     log.info('Hello world!')

     return true

This writes “Hello, world!” to the console when the role gets enabled, so you’ll have a chance to spot this. No rocket science.

Next, amend role_name in the “return” section of the hello-world.lua file. This text will be displayed as a label for your role in the cluster management web interface.

 return {
     role_name = 'Hello world!',
     init = init,
     stop = stop,
     validate_config = validate_config,
     apply_config = apply_config,

The final thing to do before you can run the application is to add your role to the list of available cluster roles in the init.lua file.

 local ok, err = cartridge.cfg({
     workdir = 'tmp/db',
     roles = {
     cluster_cookie = 'myapp-cluster-cookie',

Now the cluster will be aware of your role.

Why app.roles.hello-world? By default, the role name here should match the path from the application root (./myapp) to the role file (app/roles/hello-world.lua).

Fine! Your role is ready. Re-build the application and re-start the cluster now:

$ cartridge build
$ cartridge start

Now all instances are up, but idle, waiting for you to enable roles for them.

Instances (replicas) in a Tarantool Cartridge cluster are organized into replica sets. Roles are enabled per replica set, so all instances in a replica set have the same roles enabled.

Let’s create a replica set containing just one instance and enable your role:

  1. Open the cluster management web interface at http://localhost:8081.

  2. Click Configure.

  3. Check the role Hello world! to enable it. Notice that the role name here matches the label text that you specified in the role_name parameter in the hello-world.lua file.

  4. (Optionally) Specify the replica set name, for example “hello-world-replica-set”.

  5. Click Create replica set and see the newly-created replica set in the web interface.


Your custom role got enabled. Find the “Hello world!” message in console, like this:


Finally, open the HTTP endpoint of this instance at http://localhost:8081/hello and see the reply to your GET request.


Everything is up and running! What’s next?

  • Follow this guide to set up the rest of the cluster and try some cool cluster management features.
  • Get inspired with these examples and implement more sophisticated business logic for your role.
  • Pack your application for easy distribution. Choose what you like: a DEB or RPM package, a TGZ archive, or a Docker image.

User’s Guide


In this chapter, we introduce the basic concepts of working with Tarantool as a database manager.

This chapter contains the following sections:

Data model

This section describes how Tarantool stores values and what operations with data it supports.

If you tried to create a database as suggested in our “Getting started” exercises, then your test database now looks like this:



Tarantool operates data in the form of tuples.


A tuple is a group of data values in Tarantool’s memory. Think of it as a “database record” or a “row”. The data values in the tuple are called fields.

When Tarantool returns a tuple value in the console, by default it uses YAML format, for example: [3, 'Ace of Base', 1993].

Internally, Tarantool stores tuples as MsgPack arrays.


Fields are distinct data values, contained in a tuple. They play the same role as “row columns” or “record fields” in relational databases, with a few improvements:

  • fields can be composite structures, such as arrays or maps,
  • fields don’t need to have names.

A given tuple may have any number of fields, and the fields may be of different types.

The field’s number is the identifier of the field. Numbers are counted from base 1 in Lua and other 1-based languages, or from base 0 in languages like PHP or C/C++. So, 1 or 0 can be used in some contexts to refer to the first field of a tuple.


Tarantool stores tuples in containers called spaces. In our example there’s a space called 'tester'.


In Tarantool, a space is a primary container which stores data. It is analogous to tables in relational databases. Spaces contain tuples — the Tarantool name for database records. The number of tuples in a space is unlimited.

At least one space is required to store data with Tarantool. Each space has the following attributes:

  • a unique name specified by the user,
  • a unique numeric identifier which can be specified by the user, but usually is assigned automatically by Tarantool,
  • an engine: memtx (default) – in-memory engine, fast but limited in size, or vinyl – on-disk engine for huge data sets.

To be functional, a space also needs to have a primary index. It can also have secondary indexes.


Read the full information about indexes on page Indexes.

An index is a group of key values and pointers.

As with spaces, you should specify the index name, and let Tarantool come up with a unique numeric identifier (“index id”).

An index always has a type. The default index type is TREE. TREE indexes are provided by all Tarantool engines, can index unique and non-unique values, support partial key searches, comparisons and ordered results. Additionally, memtx engine supports HASH, RTREE and BITSET indexes.

An index may be multi-part, that is, you can declare that an index key value is composed of two or more fields in the tuple, in any order. For example, for an ordinary TREE index, the maximum number of parts is 255.

An index may be unique, that is, you can declare that it would be illegal to have the same key value twice.

The first index defined on a space is called the primary key index, and it must be unique. All other indexes are called secondary indexes, and they may be non-unique.

Data types

Tarantool is both a database manager and an application server. Therefore a developer often deals with two type sets: the types of the programming language (such as Lua) and the types of the Tarantool storage format (MsgPack).

Lua versus MsgPack
Scalar / compound MsgPack   type Lua type Example value
scalar nil nil nil
scalar boolean boolean true
scalar string string 'A B C'
scalar integer number 12345
scalar float 64 (double) number 1.2345
scalar float 64 (double) cdata 1.2345
scalar binary cdata [!!binary 3t7e]
scalar ext (for Tarantool decimal) cdata 1.2
scalar ext (for Tarantool uuid) cdata 12a34b5c-de67-8f90-
compound map table” (with string keys) {'a': 5, 'b': 6}
compound array table” (with integer keys) [1, 2, 3, 4, 5]
compound array tuple (“cdata”) [12345, 'A B C']


MsgPack values have variable lengths. So, for example, the smallest number requires only one byte, but the largest number requires nine bytes.

Field Type Details

nil. In Lua a nil type has only one possible value, also called nil (which Tarantool displays as null when using the default YAML format). Nils may be compared to values of any types with == (is-equal) or ~= (is-not-equal), but other comparison operations will not work. Nils may not be used in Lua tables; the workaround is to use box.NULL because nil == box.NULL is true. Example: nil.

boolean. A boolean is either true or false. Example: true.

integer. The Tarantool integer type is for integers between -9223372036854775808 and 18446744073709551615, which is about 18 quintillion. This corresponds to number in Lua and to integer in MsgPack. Example: -2^63.

unsigned. The Tarantool unsigned type is for integers between 0 and 18446744073709551615. So it is a subset of integer. Example: 123456.

double. The double field type exists mainly so that there will be an equivalent to Tarantool/SQL’s DOUBLE data type. In msgpuck.h (Tarantool’s interface to MsgPack) the storage type is MP_DOUBLE and the size of the encoded value is always 9 bytes. In Lua, ‘double’ fields can only contain non-integer numeric values and cdata values with double floating-point numbers. Examples: 1.234, -44, 1.447e+44.
To avoid using the wrong kind of values inadvertently, use ffi.cast() when searching or changing ‘double’ fields. For example, instead of space_object:insert{value} say ffi = require('ffi') ... space_object:insert({ffi.cast('double',value)}). Example:

s = box.schema.space.create('s', {format = {{'d', 'double'}}})
ffi = require('ffi')
s:insert({ffi.cast('double', 1)})
s:insert({ffi.cast('double', tonumber('123'))})
s:select({ffi.cast('double', 1)})

Arithmetic with cdata ‘double’ will not work reliably, so for Lua it is better to use the ‘number’ type. This warning does not apply for Tarantool/SQL because Tarantool/SQL does implicit casting.

number. In Lua a number is double-precision floating-point, but a Tarantool ‘number’ field may have both integer and floating-point values. Tarantool will try to store a Lua number as floating-point if the value contains a decimal point or is very large (greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer. To ensure that even very large numbers are stored as integers, use the tonumber64 function, or the LL (Long Long) suffix, or the ULL (Unsigned Long Long) suffix. Here are examples of numbers using regular notation, exponential notation, the ULL suffix and the tonumber64 function: -55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').

decimal. The Tarantool decimal type is stored as a MsgPack ext (Extension). Values with the decimal type are not floating-point values although they may contain decimal points. They are exact with up to 38 digits of precision. Example: a value returned by a function in the decimal module.

string. A string is a variable-length sequence of bytes, usually represented with alphanumeric characters inside single quotes. In both Lua and MsgPack, strings are treated as binary data, with no attempts to determine a string’s character set or to perform any string conversion – unless there is an optional collation. So, usually, string sorting and comparison are done byte-by-byte, without any special collation rules applied. (Example: numbers are ordered by their point on the number line, so 2345 is greater than 500; meanwhile, strings are ordered by the encoding of the first byte, then the encoding of the second byte, and so on, so '2345' is less than '500'.) Example: 'A, B, C'.

bin. A bin (binary) value is not directly supported by Lua but there is a Tarantool type varbinary which is encoded as MsgPack binary. For an (advanced) example showing how to insert varbinary into a database, see the Cookbook Recipe for ffi_varbinary_insert. Example: "\65 \66 \67".

uuid. Since version 2.4.1. The Tarantool uuid type is stored as a MsgPack ext (Extension). Values with the uuid type are Universally unique identifiers.
Example: 64d22e4d-ac92-4a23-899a-e5934af5479.

array. An array is represented in Lua with {...} (braces). Examples: as lists of numbers representing points in a geometric figure: {10, 11}, {3, 5, 9, 10}.

table. Lua tables with string keys are stored as MsgPack maps; Lua tables with integer keys starting with 1 are stored as MsgPack arrays. Nils may not be used in Lua tables; the workaround is to use box.NULL. Example: a box.space.tester:select() request will return a Lua table.

tuple. A tuple is a light reference to a MsgPack array stored in the database. It is a special type (cdata) to avoid conversion to a Lua table on retrieval. A few functions may return tables with multiple tuples. For tuple examples, see box.tuple.

scalar. Values in a scalar field can be boolean or integer or unsigned or double or number or decimal or string or varbinary – but not array or map or tuple. Examples: true, 1, 'xxx'.

any. Values in an any field can be boolean or integer or unsigned or double or number or decimal or string or varbinary – or array or map or tuple. Examples: true, 1, 'xxx', {box.NULL, 0}.

Examples of insert requests with different field types:

tarantool> box.space.K:insert{1,nil,true,'A B C',12345,1.2345}
- [1, null, true, 'A B C', 12345, 1.2345]
tarantool> box.space.K:insert{2,{['a']=5,['b']=6}}
- [2, {'a': 5, 'b': 6}]
tarantool> box.space.K:insert{3,{1,2,3,4,5}}
- [3, [1, 2, 3, 4, 5]]
Indexed field types

Indexes restrict values which Tarantool may store with MsgPack. This is why, for example, 'unsigned' and 'integer' are different field types although in MsgPack they are both stored as integer values – an 'unsigned' index contains only non-negative integer values while an ‘integer’ index contains any integer values.

Here again are the field types described in Field Type Details, and the index types they can fit in. The default field type is 'unsigned' and the default index type is TREE. Although 'nil' is not a legal indexed field type, indexes may contain nil as a non-default option. Full information is in section Details about index field types.

Field type name string Field type
Index type
'boolean' boolean TREE or HASH
'integer' (may also be called ‘int’) integer which may include unsigned values TREE or HASH
'unsigned' (may also be called ‘uint’ or ‘num’, but ‘num’ is deprecated) unsigned TREE, BITSET or HASH
'double' double TREE or HASH
'number' number which may include integer or double values TREE or HASH
'decimal' decimal TREE or HASH
'string' (may also be called ‘str’) string TREE, BITSET or HASH
'varbinary' varbinary TREE, HASH or BITSET (since version 2.7)
'uuid' uuid TREE or HASH
'array' array RTREE

may include nil or boolean or integer or unsigned or number or decimal or string or varbinary values

When a scalar field contains values of different underlying types, the key order is: nils, then booleans, then numbers, then strings, then varbinaries.



By default, when Tarantool compares strings, it uses what we call a “binary” collation. The only consideration here is the numeric value of each byte in the string. Therefore, if the string is encoded with ASCII or UTF-8, then 'A' < 'B' < 'a', because the encoding of 'A' (what used to be called the “ASCII value”) is 65, the encoding of 'B' is 66, and the encoding of 'a' is 98. Binary collation is best if you prefer fast deterministic simple maintenance and searching with Tarantool indexes.

But if you want the ordering that you see in phone books and dictionaries, then you need Tarantool’s optional collations, such as unicode and unicode_ci, which allow for 'a' < 'A' < 'B' and 'a' = 'A' < 'B' respectively.

The unicode and unicode_ci optional collations use the ordering according to the Default Unicode Collation Element Table (DUCET) and the rules described in Unicode® Technical Standard #10 Unicode Collation Algorithm (UTS #10 UCA). The only difference between the two collations is about weights:

  • unicode collation observes L1 and L2 and L3 weights (strength = ‘tertiary’),
  • unicode_ci collation observes only L1 weights (strength = ‘primary’), so for example ‘a’ = ‘A’ = ‘á’ = ‘Á’.

As an example, take some Russian words:


…and show the difference in ordering and selecting by index:

  • with unicode collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type = 'str', collation='unicode'}}})
    tarantool> box.space.T.index.I:select()
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ель']
      - ['ЕЛь']
    tarantool> box.space.T.index.I:select{'ЁлКа'}
    - []
  • with unicode_ci collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type ='str', collation='unicode_ci'}}})
    tarantool> box.space.S.index.I:select()
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ЕЛь']
    tarantool> box.space.S.index.I:select{'ЁлКа'}
    - - ['ёлка']

In all, collation involves much more than these simple examples of upper case / lower case and accented / unaccented equivalence in alphabets. We also consider variations of the same character, non-alphabetic writing systems, and special rules that apply for combinations of characters.

For English: use “unicode” and “unicode_ci”. For Russian: use “unicode” and “unicode_ci” (although a few Russians might prefer the Kyrgyz collation which says Cyrillic letters ‘Е’ and ‘Ё’ are the same with level-1 weights). For Dutch, German (dictionary), French, Indonesian, Irish, Italian, Lingala, Malay, Portuguese, Southern Soho, Xhosa, or Zulu: “unicode” and “unicode_ci” will do.

The tailored optional collations: For other languages, Tarantool supplies tailored collations for every modern language that has more than a million native speakers, and for specialized situations such as the difference between dictionary order and telephone book order. To see a complete list say box.space._collation:select(). The tailored collation names have the form unicode_[language code]_[strength] where language code is a standard 2-character or 3-character language abbreviation, and strength is s1 for “primary strength” (level-1 weights), s2 for “secondary”, s3 for “tertiary”. Tarantool uses the same language codes as the ones in the “list of tailorable locales” on man pages of Ubuntu and Fedora. Charts explaining the precise differences from DUCET order are in the Common Language Data Repository.


A sequence is a generator of ordered integer values.

As with spaces and indexes, you should specify the sequence name, and let Tarantool come up with a unique numeric identifier (“sequence id”).

As well, you can specify several options when creating a new sequence. The options determine what value will be generated whenever the sequence is used.

Options for box.schema.sequence.create()
Option name Type and meaning Default Examples
start Integer. The value to generate the first time a sequence is used 1 start=0
min Integer. Values smaller than this cannot be generated 1 min=-1000
max Integer. Values larger than this cannot be generated 9223372036854775807 max=0
cycle Boolean. Whether to start again when values cannot be generated false cycle=true
cache Integer. The number of values to store in a cache 0 cache=0
step Integer. What to add to the previous generated value, when generating a new value 1 step=-1
if_not_exists Boolean. If this is true and a sequence with this name exists already, ignore other options and use the existing values false if_not_exists=true

Once a sequence exists, it can be altered, dropped, reset, forced to generate the next value, or associated with an index.

For an initial example, we generate a sequence named ‘S’.

tarantool> box.schema.sequence.create('S',{min=5, start=5})
- step: 1
  id: 5
  min: 5
  cache: 0
  uid: 1
  max: 9223372036854775807
  cycle: false
  name: S
  start: 5

The result shows that the new sequence has all default values, except for the two that were specified, min and start.

Then we get the next value, with the next() function.

tarantool> box.sequence.S:next()
- 5

The result is the same as the start value. If we called next() again, we would get 6 (because the previous value plus the step value is 6), and so on.

Then we create a new table, and say that its primary key may be generated from the sequence.

tarantool> s=box.schema.space.create('T')
tarantool> s:create_index('I',{sequence='S'})
- parts:
  - type: unsigned
    is_nullable: false
    fieldno: 1
  sequence_id: 1
  id: 0
  space_id: 520
  unique: true
  type: TREE
  sequence_fieldno: 1
  name: I

Then we insert a tuple, without specifying a value for the primary key.

tarantool> box.space.T:insert{nil,'other stuff'}
- [6, 'other stuff']

The result is a new tuple where the first field has a value of 6. This arrangement, where the system automatically generates the values for a primary key, is sometimes called “auto-incrementing” or “identity”.

For syntax and implementation details, see the reference for box.schema.sequence.


In Tarantool, updates to the database are recorded in the so-called write ahead log (WAL) files. This ensures data persistence. When a power outage occurs or the Tarantool instance is killed incidentally, the in-memory database is lost. In this situation, WAL files are used to restore the data. Namely, Tarantool reads the WAL files and redoes the requests (this is called the “recovery process”). You can change the timing of the WAL writer, or turn it off, by setting wal_mode.

Tarantool also maintains a set of snapshot files. These files contain an on-disk copy of the entire data set for a given moment. Instead of reading every WAL file since the databases were created, the recovery process can load the latest snapshot file and then read only those WAL files that were produced after the snapshot file was made. After checkpointing, old WAL files can be removed to free up space.

To force immediate creation of a snapshot file, you can use Tarantool’s box.snapshot() request. To enable automatic creation of snapshot files, you can use Tarantool’s checkpoint daemon. The checkpoint daemon sets intervals for forced checkpoints. It makes sure that the states of both memtx and vinyl storage engines are synchronized and saved to disk, and automatically removes old WAL files.

Snapshot files can be created even if there is no WAL file.


The memtx engine makes only regular checkpoints with the interval set in checkpoint daemon configuration.

The vinyl engine runs checkpointing in the background at all times.

See the Internals section for more details about the WAL writer and the recovery process.


Data operations

The basic data operations supported in Tarantool are:

  • five data-manipulation operations (INSERT, UPDATE, UPSERT, DELETE, REPLACE), and
  • one data-retrieval operation (SELECT).

All of them are implemented as functions in box.space submodule.


  • INSERT: Add a new tuple to space ‘tester’.

    The first field, field[1], will be 999 (MsgPack type is integer).

    The second field, field[2], will be ‘Taranto’ (MsgPack type is string).

    tarantool> box.space.tester:insert{999, 'Taranto'}
  • UPDATE: Update the tuple, changing field field[2].

    The clause “{999}”, which has the value to look up in the index of the tuple’s primary-key field, is mandatory, because update() requests must always have a clause that specifies a unique key, which in this case is field[1].

    The clause “{{‘=’, 2, ‘Tarantino’}}” specifies that assignment will happen to field[2] with the new value.

    tarantool> box.space.tester:update({999}, {{'=', 2, 'Tarantino'}})
  • UPSERT: Upsert the tuple, changing field field[2] again.

    The syntax of upsert() is similar to the syntax of update(). However, the execution logic of these two requests is different. UPSERT is either UPDATE or INSERT, depending on the database’s state. Also, UPSERT execution is postponed until after transaction commit, so, unlike update(), upsert() doesn’t return data back.

    tarantool> box.space.tester:upsert({999, 'Taranted'}, {{'=', 2, 'Tarantism'}})
  • REPLACE: Replace the tuple, adding a new field.

    This is also possible with the update() request, but the update() request is usually more complicated.

    tarantool> box.space.tester:replace{999, 'Tarantella', 'Tarantula'}
  • SELECT: Retrieve the tuple.

    The clause “{999}” is still mandatory, although it does not have to mention the primary key.

    tarantool> box.space.tester:select{999}
  • DELETE: Delete the tuple.

    In this example, we identify the primary-key field.

    tarantool> box.space.tester:delete{999}

Summarizing the examples:

  • Functions insert and replace accept a tuple (where a primary key comes as part of the tuple).
  • Function upsert accepts a tuple (where a primary key comes as part of the tuple), and also the update operations to execute.
  • Function delete accepts a full key of any unique index (primary or secondary).
  • Function update accepts a full key of any unique index (primary or secondary), and also the operations to execute.
  • Function select accepts any key: primary/secondary, unique/non-unique, full/partial.

See reference on box.space for more details on using data operations.


Besides Lua, you can use Perl, PHP, Python or other programming language connectors. The client server protocol is open and documented. See this annotated BNF.

Complexity factors

In reference for box.space and Submodule box.index submodules, there are notes about which complexity factors might affect the resource usage of each function.

Complexity factor Effect
Index size The number of index keys is the same as the number of tuples in the data set. For a TREE index, if there are more keys, then the lookup time will be greater, although of course the effect is not linear. For a HASH index, if there are more keys, then there is more RAM used, but the number of low-level steps tends to remain constant.
Index type Typically, a HASH index is faster than a TREE index if the number of tuples in the space is greater than one.
Number of indexes accessed

Ordinarily, only one index is accessed to retrieve one tuple. But to update the tuple, there must be N accesses if the space has N different indexes.

Note re storage engine: Vinyl optimizes away such accesses if secondary index fields are unchanged by the update. So, this complexity factor applies only to memtx, since it always makes a full-tuple copy on every update.

Number of tuples accessed A few requests, for example SELECT, can retrieve multiple tuples. This factor is usually less important than the others.
WAL settings The important setting for the write-ahead log is wal_mode. If the setting causes no writing or delayed writing, this factor is unimportant. If the setting causes every data-change request to wait for writing to finish on a slow device, this factor is more important than all the others.


Transactions in Tarantool occur in fibers on a single thread. That is why Tarantool has a guarantee of execution atomicity. That requires emphasis.

Threads, fibers and yields

How does Tarantool process a basic operation? As an example, let’s take this query:

tarantool> box.space.tester:update({3}, {{'=', 2, 'size'}, {'=', 3, 0}})

This is equivalent to the following SQL statement for a table that stores primary keys in field[1]:

UPDATE tester SET "field[2]" = 'size', "field[3]" = 0 WHERE "field[1]" = 3

Assuming this query is received by Tarantool via network, it will be processed with three operating system threads:

  1. The network thread on the server side receives the query, parses the statement, checks if it’s correct, and then transforms it into a special structure–a message containing an executable statement and its options.

  2. The network thread ships this message to the instance’s transaction processor thread using a lock-free message bus. Lua programs execute directly in the transaction processor thread, and do not require parsing and preparation.

    The instance’s transaction processor thread uses the primary-key index on field[1] to find the location of the tuple. It determines that the tuple can be updated (not much can go wrong when you’re merely changing an unindexed field value).

  3. The transaction processor thread sends a message to the write-ahead logging (WAL) thread to commit the transaction. When done, the WAL thread replies with a COMMIT or ROLLBACK result to the transaction processor which gives it back to the network thread, and the network thread returns the result to the client.

Notice that there is only one transaction processor thread in Tarantool. Some people are used to the idea that there can be multiple threads operating on the database, with (say) thread #1 reading row #x, while thread #2 writes row #y. With Tarantool, no such thing ever happens. Only the transaction processor thread can access the database, and there is only one transaction processor thread for each Tarantool instance.

Like any other Tarantool thread, the transaction processor thread can handle many fibers. A fiber is a set of computer instructions that may contain “yield” signals. The transaction processor thread will execute all computer instructions until a yield, then switch to execute the instructions of a different fiber. Thus (say) the thread reads row #x for the sake of fiber #1, then writes row #y for the sake of fiber #2.

Yields must happen, otherwise the transaction processor thread would stick permanently on the same fiber. There are two types of yields:

  • implicit yields: every data-change operation or network-access causes an implicit yield, and every statement that goes through the Tarantool client causes an implicit yield.
  • explicit yields: in a Lua function, you can (and should) add “yield” statements to prevent hogging. This is called cooperative multitasking.

Cooperative multitasking

Cooperative multitasking means: unless a running fiber deliberately yields control, it is not preempted by some other fiber. But a running fiber will deliberately yield when it encounters a “yield point”: a transaction commit, an operating system call, or an explicit “yield” request. Any system call which can block will be performed asynchronously, and any running fiber which must wait for a system call will be preempted, so that another ready-to-run fiber takes its place and becomes the new running fiber.

This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there will be no concurrency around a resource, no race conditions, and no memory consistency issues. The way to achieve this is quite simple: in critical sections, don’t use yields, explicit or implicit, and no one can interfere into the code execution.

When requests are small, for example simple UPDATE or INSERT or DELETE or SELECT, fiber scheduling is fair: it takes only a little time to process the request, schedule a disk write, and yield to a fiber serving the next client.

However, a function might perform complex computations or might be written in such a way that yields do not occur for a long time. This can lead to unfair scheduling, when a single client throttles the rest of the system, or to apparent stalls in request processing. Avoiding this situation is the responsibility of the function’s author.


In the absence of transactions, any function that contains yield points may see changes in the database state caused by fibers that preempt. Multi-statement transactions exist to provide isolation: each transaction sees a consistent database state and commits all its changes atomically. At commit time, a yield happens and all transaction changes are written to the write ahead log in a single batch. Or, if needed, transaction changes can be rolled back – completely or to a specific savepoint.

In Tarantool, transaction isolation level is serializable with the clause “if no failure during writing to WAL”. In case of such a failure that can happen, for example, if the disk space is over, the transaction isolation level becomes read uncommitted.

In vinyl, to implement isolation Tarantool uses a simple optimistic scheduler: the first transaction to commit wins. If a concurrent active transaction has read a value modified by a committed transaction, it is aborted.

The cooperative scheduler ensures that, in absence of yields, a multi-statement transaction is not preempted and hence is never aborted. Therefore, understanding yields is essential to writing abort-free code.

Sometimes while testing the transaction mechanism in Tarantool you can notice that yielding after box.begin() but before any read/write operation does not cause an abort as it should according to the description. This happens because actually box.begin() does not start a transaction. It is a mark telling Tarantool to start a transaction after some database request that follows.

In memtx, if an instruction that implies yields, explicit or implicit, is executed during a transaction, the transaction is fully rolled back. In vinyl, we use more complex transactional manager that allows yields.


You can’t mix storage engines in a transaction today.

Implicit yields

The only explicit yield requests in Tarantool are fiber.sleep() and fiber.yield(), but many other requests “imply” yields because Tarantool is designed to avoid blocking.

Database requests imply yields if and only if there is disk I/O. For memtx, since all data is in memory, there is no disk I/O during a read request. For vinyl, since some data may not be in memory, there may be disk I/O for a read (to fetch data from disk) or for a write (because a stall may occur while waiting for memory to be free). For both memtx and vinyl, since data-change requests must be recorded in the WAL, there is normally a commit. A commit happens automatically after every request in default “autocommit” mode, or a commit happens at the end of a transaction in “transaction” mode, when a user deliberately commits by calling box.commit(). Therefore for both memtx and vinyl, because there can be disk I/O, some database operations may imply yields.

Many functions in modules fio, net_box, console and socket (the “os” and “network” requests) yield.

That is why executing separate commands such as select(), insert(), update() in the console inside a transaction will cause an abort. This is due to implicit yield happening after each chunk of code is executed in the console.

Example #1

  • Engine = memtx
    The sequence select() insert() has one yield, at the end of insertion, caused by implicit commit; select() has nothing to write to the WAL and so does not yield.
  • Engine = vinyl
    The sequence select() insert() has one to three yields, since select() may yield if the data is not in cache, insert() may yield waiting for available memory, and there is an implicit yield at commit.
  • The sequence begin() insert() insert() commit() yields only at commit if the engine is memtx, and can yield up to 3 times if the engine is vinyl.

Example #2

Assume that in the memtx space ‘tester’ there are tuples in which the third field represents a positive dollar amount. Let’s start a transaction, withdraw from tuple#1, deposit in tuple#2, and end the transaction, making its effects permanent.

tarantool> function txn_example(from, to, amount_of_money)
         >   box.begin()
         >   box.space.tester:update(from, {{'-', 3, amount_of_money}})
         >   box.space.tester:update(to,   {{'+', 3, amount_of_money}})
         >   box.commit()
         >   return "ok"
         > end
tarantool> txn_example({999}, {1000}, 1.00)
- "ok"

If wal_mode = ‘none’, then implicit yielding at commit time does not take place, because there are no writes to the WAL.

If a task is interactive – sending requests to the server and receiving responses – then it involves network I/O, and therefore there is an implicit yield, even if the request that is sent to the server is not itself an implicit yield request. Therefore, the following sequence


causes yields three times sequentially when sending requests to the network and awaiting the results. On the server side, the same requests are executed in common order possibly mixing with other requests from the network and local fibers. Something similar happens when using clients that operate via telnet, via one of the connectors, or via the MySQL and PostgreSQL rocks, or via the interactive mode when using Tarantool as a client.

After a fiber has yielded and then has regained control, it immediately issues testcancel.

Transactional manager

Since version 2.6.1, Tarantool has another option for transaction behavior that allows yielding inside a memtx transaction. This is controled by the transactional manager.

The transactional manager is designed for isolation of concurrent transactions and provides serializable transaction isolation level. It consists of two parts:

  • MVCC engine
  • conflict manager.

The MVCC engine provides personal read views for transactions if necessary. The conflict manager tracks transactions’ changes and determines their correctness in serialization order. Of course, once yielded, a transaction could interfere with other transactions and could be aborted due to conflict.

Another important thing to mention is that the transaction manager provides non-classic snapshot isolation level. It means that a transaction can get a consistent snapshot of the database (that is common) but this snapshot is not necessarily bound to the moment of the beginning of the transaction (that is not common). The conflict manager makes decisions on whether and when each transaction gets which snapshot. That allows to avoid some conflicts comparing with classical snapshot isolation approach.

The transactional manager can be switched on and off by the box.cfg option memtx_use_mvcc_engine.

Access control

Understanding security details is primarily an issue for administrators. However, ordinary users should at least skim this section to get an idea of how Tarantool makes it possible for administrators to prevent unauthorized access to the database and to certain functions.


  • There is a method to guarantee with password checks that users really are who they say they are (“authentication”).
  • There is a _user system space, where usernames and password-hashes are stored.
  • There are functions for saying that certain users are allowed to do certain things (“privileges”).
  • There is a _priv system space, where privileges are stored. Whenever a user tries to do an operation, there is a check whether the user has the privilege to do the operation (“access control”).

Details follow.


There is a current user for any program working with Tarantool, local or remote. If a remote connection is using a binary port, the current user, by default, is ‘guest’. If the connection is using an admin-console port, the current user is ‘admin’. When executing a Lua initialization script, the current user is also ‘admin’.

The current user name can be found with box.session.user().

The current user can be changed:

  • For a binary port connection – with the AUTH protocol command, supported by most clients;
  • For an admin-console connection and in a Lua initialization script – with box.session.su();
  • For a binary-port connection invoking a stored function with the CALL command – if the SETUID property is enabled for the function, Tarantool temporarily replaces the current user with the function’s creator, with all the creator’s privileges, during function execution.


Each user (except ‘guest’) may have a password. The password is any alphanumeric string.

Tarantool passwords are stored in the _user system space with a cryptographic hash function so that, if the password is ‘x’, the stored hash-password is a long string like ‘lL3OvhkIPOKh+Vn9Avlkx69M/Ck=‘. When a client connects to a Tarantool instance, the instance sends a random salt value which the client must mix with the hashed-password before sending to the instance. Thus the original value ‘x’ is never stored anywhere except in the user’s head, and the hashed value is never passed down a network wire except when mixed with a random salt.


For more details of the password hashing algorithm (e.g. for the purpose of writing a new client application), read the scramble.h header file.

This system prevents malicious onlookers from finding passwords by snooping in the log files or snooping on the wire. It is the same system that MySQL introduced several years ago, which has proved adequate for medium-security installations. Nevertheless, administrators should warn users that no system is foolproof against determined long-term attacks, so passwords should be guarded and changed occasionally. Administrators should also advise users to choose long unobvious passwords, but it is ultimately up to the users to choose or change their own passwords.

There are two functions for managing passwords in Tarantool: box.schema.user.passwd() for changing a user’s password and box.schema.user.password() for getting a hash of a user’s password.

Owners and privileges

Tarantool has one database. It may be called “box.schema” or “universe”. The database contains database objects, including spaces, indexes, users, roles, sequences, and functions.

The owner of a database object is the user who created it. The owner of the database itself, and the owner of objects that are created initially (the system spaces and the default users) is ‘admin’.

Owners automatically have privileges for what they create. They can share these privileges with other users or with roles, using box.schema.user.grant() requests. The following privileges can be granted:

  • ‘read’, e.g. allow select from a space
  • ‘write’, e.g. allow update on a space
  • ‘execute’, e.g. allow call of a function, or (less commonly) allow use of a role
  • ‘create’, e.g. allow box.schema.space.create (access to certain system spaces is also necessary)
  • ‘alter’, e.g. allow box.space.x.index.y:alter (access to certain system spaces is also necessary)
  • ‘drop’, e.g. allow box.sequence.x:drop (access to certain system spaces is also necessary)
  • ‘usage’, e.g. whether any action is allowable regardless of other privileges (sometimes revoking ‘usage’ is a convenient way to block a user temporarily without dropping the user)
  • ‘session’, e.g. whether the user can ‘connect’.

To create objects, users need the ‘create’ privilege and at least ‘read’ and ‘write’ privileges on the system space with a similar name (for example, on the _space if the user needs to create spaces).

To access objects, users need an appropriate privilege on the object (for example, the ‘execute’ privilege on function F if the users need to execute function F). See below some examples for granting specific privileges that a grantor – that is, ‘admin’ or the object creator – can make.

To drop an object, a user must be an ‘admin’ or have the ‘super’ role. Some objects may also be dropped by their creators. As the owner of the entire database, any ‘admin’ can drop any object, including other users.

To grant privileges to a user, the object owner says box.schema.user.grant(). To revoke privileges from a user, the object owner says box.schema.user.revoke(). In either case, there are up to five parameters:

(user-name, privilege, object-type [, object-name [, options]])
  • user-name is the user (or role) that will receive or lose the privilege;

  • privilege is any of ‘read’, ‘write’, ‘execute’, ‘create’, ‘alter’, ‘drop’, ‘usage’, or ‘session’ (or a comma-separated list);

  • object-type is any of ‘space’, ‘index’, ‘sequence’, ‘function’, ‘user’, ‘role’, or ‘universe’;

  • object-name is what the privilege is for (omitted if object-type is ‘universe’) (may be omitted or nil if the intent is to grant for all objects of the same type);

  • options is a list inside braces, for example {if_not_exists=true|false} (usually omitted because the default is acceptable).

    All updates of user privileges are reflected immediately in the existing sessions and objects, e.g. functions.

Example for granting many privileges at once

In this example an ‘admin’ user grants many privileges on many objects to user ‘U’, using a single request.


Examples for granting privileges for specific operations

In these examples an administrator grants strictly the minimal privileges necessary for particular operations, to user ‘U’.

-- So that 'U' can create spaces:
  box.schema.user.grant('U','write', 'space', '_schema')
  box.schema.user.grant('U','write', 'space', '_space')
-- So that 'U' can  create indexes on space T
  box.schema.user.grant('U','write', 'space', '_index')
-- So that 'U' can  alter indexes on space T (assuming 'U' did not create the index)
-- So that 'U' can alter indexes on space T (assuming 'U' created the index)
-- So that 'U' can create users:
  box.schema.user.grant('U', 'read,write', 'space', '_user')
  box.schema.user.grant('U', 'write', 'space', '_priv')
-- So that 'U' can create roles:
  box.schema.user.grant('U', 'read,write', 'space', '_user')
  box.schema.user.grant('U', 'write', 'space', '_priv')
-- So that 'U' can create sequence generators:
  box.schema.user.grant('U', 'read,write', 'space', '_sequence')
-- So that 'U' can create functions:
-- So that 'U' can create any object of any type
-- So that 'U' can grant access on objects that 'U' created
-- So that 'U' can select or get from a space named 'T'
-- So that 'U' can update or insert or delete or truncate a space named 'T'
-- So that 'U' can execute a function named 'F'
-- So that 'U' can use the "S:next()" function with a sequence named S
-- So that 'U' can use the "S:set()" or "S:reset() function with a sequence named S
-- So that 'U' can drop a sequence (assuming 'U' did not create it)
-- So that 'U' can drop a function (assuming 'U' did not create it)
-- So that 'U' can drop a space that has some associated objects
-- So that 'U' can drop any space (ignore if the privilege exists already)

Example for creating users and objects then granting privileges

Here a Lua function is created that will be executed under the user ID of its creator, even if called by another user.

First, the two spaces (‘u’ and ‘i’) are created, and a no-password user (‘internal’) is grante full access to them. Then a (‘read_and_modify’) is defined and the no-password user becomes this function’s creator. Finally, another user (‘public_user’) is granted access to execute Lua functions created by the no-password user.



box.schema.user.grant('internal', 'read,write', 'space', 'u')
box.schema.user.grant('internal', 'read,write', 'space', 'i')
box.schema.user.grant('internal', 'create', 'universe')
box.schema.user.grant('internal', 'read,write', 'space', '_func')

function read_and_modify(key)
  local u = box.space.u
  local i = box.space.i
  local fiber = require('fiber')
  local t = u:get{key}
  if t ~= nil then
    u:put{key, box.session.uid()}
    i:put{key, fiber.time()}

box.schema.func.create('read_and_modify', {setuid= true})
box.schema.user.create('public_user', {password = 'secret'})
box.schema.user.grant('public_user', 'execute', 'function', 'read_and_modify')


A role is a container for privileges which can be granted to regular users. Instead of granting or revoking individual privileges, you can put all the privileges in a role and then grant or revoke the role.

Role information is stored in the _user space, but the third field in the tuple – the type field – is ‘role’ rather than ‘user’.

An important feature in role management is that roles can be nested. For example, role R1 can be granted a privileged “role R2”, so users with the role R1 will subsequently get all privileges from both roles R1 and R2. In other words, a user gets all the privileges granted to a user’s roles, directly or indirectly.

There are actually two ways to grant or revoke a role: box.schema.user.grant-or-revoke(user-name-or-role-name,'execute', 'role',role-name...) or box.schema.user.grant-or-revoke(user-name-or-role-name,role-name...). The second way is preferable.

The ‘usage’ and ‘session’ privileges cannot be granted to roles.


-- This example will work for a user with many privileges, such as 'admin'
-- or a user with the pre-defined 'super' role
-- Create space T with a primary index
box.space.T:create_index('primary', {})
-- Create the user U1 so that later the current user can be changed to U1
-- Create two roles, R1 and R2
-- Grant role R2 to role R1 and role R1 to user U1 (order doesn't matter)
-- There are two ways to grant a role; here the shorter way is used
box.schema.role.grant('R1', 'R2')
box.schema.user.grant('U1', 'R1')
-- Grant read/write privileges for space T to role R2
-- (but not to role R1, and not to user U1)
box.schema.role.grant('R2', 'read,write', 'space', 'T')
-- Change the current user to user U1
-- An insertion to space T will now succeed because (due to nested roles)
-- user U1 has write privilege on space T

More details are to be found in box.schema.user.grant() and box.schema.role.grant() in the built-in modules reference.

Sessions and security

A session is the state of a connection to Tarantool. It contains:

  • An integer ID identifying the connection,
  • the current user associated with the connection,
  • text description of the connected peer, and
  • session local state, such as Lua variables and functions.

In Tarantool, a single session can execute multiple concurrent transactions. Each transaction is identified by a unique integer ID, which can be queried at start of the transaction using box.session.sync().


To track all connects and disconnects, you can use connection and authentication triggers.


Triggers, also known as callbacks, are functions which the server executes when certain events happen.

To associate an event with a callback, one should pass the callback to the corresponding on_event function:

Then the server will store the callback function and call it when the corresponding event happens.

All triggers have the following characteristics:

  • Triggers are defined only by the ‘admin’ user.
  • Triggers are stored in the Tarantool instance’s memory, not in the database. Therefore triggers disappear when the instance is shut down. To make them permanent, put function definitions and trigger settings into Tarantool’s initialization script.
  • Triggers have low overhead. If a trigger is not defined, then the overhead is minimal: merely a pointer dereference and check. If a trigger is defined, then its overhead is equivalent to the overhead of calling a function.
  • There can be multiple triggers for one event. In this case, triggers are executed in the reverse order that they were defined in. (Exception: member triggers are executed in the order that they appear in the member list.)
  • Triggers must work within the event context. However, effects are undefined if a function contains requests which normally could not occur immediately after the event, but only before the return from the event. For example, putting os.exit() or box.rollback() in a trigger function would be bringing in requests outside the event context.
  • Triggers are replaceable. The request to “redefine a trigger” implies passing a new trigger function and an old trigger function to one of the on_event functions.
  • The on_event functions all have parameters which are function pointers, and they all return function pointers. Remember that a Lua function definition such as function f() x = x + 1 end is the same as f = function () x = x + 1 end - in both cases f gets a function pointer. And trigger = box.session.on_connect(f) is the same as trigger = box.session.on_connect(function () x = x + 1 end) - in both cases trigger gets the function pointer which was passed.
  • You can call any on_event function with no arguments to get a list of its triggers. For example, use box.session.on_connect() to return a table of all connect-trigger functions.
  • Triggers can be useful in solving problems with replication. See details in Resolving replication conflicts.


Here we log connect and disconnect events into Tarantool server log.

log = require('log')

function on_connect_impl()
 log.info("connected "..box.session.peer()..", sid "..box.session.id())

function on_disconnect_impl()
 log.info("disconnected, sid "..box.session.id())

function on_auth_impl(user)
 log.info("authenticated sid "..box.session.id().." as "..user)

function on_connect() pcall(on_connect_impl) end
function on_disconnect() pcall(on_disconnect_impl) end
function on_auth(user) pcall(on_auth_impl, user) end



Number of parts in an index

For TREE or HASH indexes, the maximum is 255 (box.schema.INDEX_PART_MAX). For RTREE indexes, the maximum is 1 but the field is an ARRAY of up to 20 dimensions. For BITSET indexes, the maximum is 1.

Number of indexes in a space

128 (box.schema.INDEX_MAX).

Number of fields in a tuple

The theoretical maximum is 2,147,483,647 (box.schema.FIELD_MAX). The practical maximum is whatever is specified by the space’s field_count member, or the maximal tuple length.

Number of bytes in a tuple

The maximal number of bytes in a tuple is roughly equal to memtx_max_tuple_size or vinyl_max_tuple_size (with a metadata overhead of about 20 bytes per tuple, which is added on top of useful bytes). By default, the value of either memtx_max_tuple_size or vinyl_max_tuple_size is 1,048,576. To increase it, specify a larger value when starting the Tarantool instance. For example, box.cfg{memtx_max_tuple_size=2*1048576}.

Number of bytes in an index key

If a field in a tuple can contain a million bytes, then the index key can contain a million bytes, so the maximum is determined by factors such as Number of bytes in a tuple, not by the index support.

Number of spaces

The theoretical maximum is 2147483647 (box.schema.SPACE_MAX) but the practical maximum is around 65,000.

Number of connections

The practical limit is the number of file descriptors that one can set with the operating system.

Space size

The total maximum size for all spaces is in effect set by memtx_memory, which in turn is limited by the total available memory.

Update operations count

The maximum number of operations per tuple that can be in a single update is 4000 (BOX_UPDATE_OP_CNT_MAX).

Number of users and roles


Length of an index name or space name or user name

65000 (box.schema.NAME_MAX).

Number of replicas in a replica set

32 (vclock.VCLOCK_MAX).

Storage engines

A storage engine is a set of very-low-level routines which actually store and retrieve tuple values. Tarantool offers a choice of two storage engines:

  • memtx (the in-memory storage engine) is the default and was the first to arrive.

  • vinyl (the on-disk storage engine) is a working key-value engine and will especially appeal to users who like to see data go directly to disk, so that recovery time might be shorter and database size might be larger.

    On the other hand, vinyl lacks some functions and options that are available with memtx. Where that is the case, the relevant description in this manual contains a note beginning with the words “Note re storage engine”.

Further in this section we discuss the details of storing data using the vinyl storage engine.

To specify that the engine should be vinyl, add the clause engine = 'vinyl' when creating a space, for example:

space = box.schema.space.create('name', {engine='vinyl'})

Differences between memtx and vinyl storage engines

The primary difference between memtx and vinyl is that memtx is an “in-memory” engine while vinyl is an “on-disk” engine. An in-memory storage engine is generally faster (each query is usually run under 1 ms), and the memtx engine is justifiably the default for Tarantool, but on-disk engine such as vinyl is preferable when the database is larger than the available memory and adding more memory is not a realistic option.

Option memtx vinyl
Supported index type TREE, HASH, RTREE or BITSET TREE
Temporary spaces Supported Not supported
random() function Supported Not supported
alter() function Supported Supported starting from the 1.10.2 release (the primary index cannot be modified)
len() function Returns the number of tuples in the space Returns the maximum approximate number of tuples in the space
count() function Takes a constant amount of time Takes a variable amount of time depending on a state of a DB
delete() function Returns the deleted tuple, if any Always returns nil
yield Does not yield on the select requests unless the transaction is committed to WAL Yields on the select requests or on its equivalents: get() or pairs()

Storing data with vinyl

Tarantool is a transactional and persistent DBMS that maintains 100% of its data in RAM. The greatest advantages of in-memory databases are their speed and ease of use: they demonstrate consistently high performance, but you never need to tune them.

A few years ago we decided to extend the product by implementing a classical storage engine similar to those used by regular DBMSs: it uses RAM for caching, while the bulk of its data is stored on disk. We decided to make it possible to set a storage engine independently for each table in the database, which is the same way that MySQL approaches it, but we also wanted to support transactions from the very beginning.

The first question we needed to answer was whether to create our own storage engine or use an existing library. The open-source community offered a few viable solutions. The RocksDB library was the fastest growing open-source library and is currently one of the most prominent out there. There were also several lesser-known libraries to consider, such as WiredTiger, ForestDB, NestDB, and LMDB.

Nevertheless, after studying the source code of existing libraries and considering the pros and cons, we opted for our own storage engine. One reason is that the existing third-party libraries expected requests to come from multiple operating system threads and thus contained complex synchronization primitives for controlling parallel data access. If we had decided to embed one of these in Tarantool, we would have made our users bear the overhead of a multithreaded application without getting anything in return. The thing is, Tarantool has an actor-based architecture. The way it processes transactions in a dedicated thread allows it to do away with the unnecessary locks, interprocess communication, and other overhead that accounts for up to 80% of processor time in multithreaded DBMSs.


The Tarantool process consists of a fixed number of “actor” threads

If you design a database engine with cooperative multitasking in mind right from the start, it not only significantly speeds up the development process, but also allows the implementation of certain optimization tricks that would be too complex for multithreaded engines. In short, using a third-party solution wouldn’t have yielded the best result.


Once the idea of using an existing library was off the table, we needed to pick an architecture to build upon. There are two competing approaches to on-disk data storage: the older one relies on B-trees and their variations; the newer one advocates the use of log-structured merge-trees, or “LSM” trees. MySQL, PostgreSQL, and Oracle use B-trees, while Cassandra, MongoDB, and CockroachDB have adopted LSM trees.

B-trees are considered better suited for reads and LSM trees—for writes. However, with SSDs becoming more widespread and the fact that SSDs have read throughput that’s several times greater than write throughput, the advantages of LSM trees in most scenarios was more obvious to us.

Before dissecting LSM trees in Tarantool, let’s take a look at how they work. To do that, we’ll begin by analyzing a regular B-tree and the issues it faces. A B-tree is a balanced tree made up of blocks, which contain sorted lists of key- value pairs. (Topics such as filling and balancing a B-tree or splitting and merging blocks are outside of the scope of this article and can easily be found on Wikipedia). As a result, we get a container sorted by key, where the smallest element is stored in the leftmost node and the largest one in the rightmost node. Let’s have a look at how insertions and searches in a B-tree happen.


Classical B-tree

If you need to find an element or check its membership, the search starts at the root, as usual. If the key is found in the root block, the search stops; otherwise, the search visits the rightmost block holding the largest element that’s not larger than the key being searched (recall that elements at each level are sorted). If the first level yields no results, the search proceeds to the next level. Finally, the search ends up in one of the leaves and probably locates the needed key. Blocks are stored and read into RAM one by one, meaning the algorithm reads logB(N) blocks in a single search, where N is the number of elements in the B-tree. In the simplest case, writes are done similarly: the algorithm finds the block that holds the necessary element and updates (inserts) its value.

To better understand the data structure, let’s consider a practical example: say we have a B-tree with 100,000,000 nodes, a block size of 4096 bytes, and an element size of 100 bytes. Thus each block will hold up to 40 elements (all overhead considered), and the B-tree will consist of around 2,570,000 blocks and 5 levels: the first four will have a size of 256 Mb, while the last one will grow up to 10 Gb. Obviously, any modern computer will be able to store all of the levels except the last one in filesystem cache, so read requests will require just a single I/O operation.

But if we change our perspective —B-trees don’t look so good anymore. Suppose we need to update a single element. Since working with B-trees involves reading and writing whole blocks, we would have to read in one whole block, change our 100 bytes out of 4096, and then write the whole updated block to disk. In other words,we were forced to write 40 times more data than we actually modified!

If you take into account the fact that an SSD block has a size of 64 Kb+ and not every modification changes a whole element, the extra disk workload can be greater still.

Authors of specialized literature and blogs dedicated to on-disk data storage have coined two terms for these phenomena: extra reads are referred to as “read amplification” and writes as “write amplification”.

The amplification factor (multiplication coefficient) is calculated as the ratio of the size of actual read (or written) data to the size of data needed (or actually changed). In our B-tree example, the amplification factor would be around 40 for both reads and writes.

The huge number of extra I/O operations associated with updating data is one of the main issues addressed by LSM trees. Let’s see how they work.

The key difference between LSM trees and regular B-trees is that LSM trees don’t just store data (keys and values), but also data operations: insertions and deletions.


LSM tree:

  • Stores statements, not values:
    • DELETE
    • UPSERT
  • Every statement is marked by LSN Append-only files, garbage is collected after a checkpoint
  • Transactional log of all filesystem changes: vylog

For example, an element corresponding to an insertion operation has, apart from a key and a value, an extra byte with an operation code (“REPLACE” in the image above). An element representing the deletion operation contains a key (since storing a value is unnecessary) and the corresponding operation code—”DELETE”. Also, each LSM tree element has a log sequence number (LSN), which is the value of a monotonically increasing sequence that uniquely identifies each operation. The whole tree is first ordered by key in ascending order, and then, within a single key scope, by LSN in descending order.


A single level of an LSM tree

Filling an LSM tree

Unlike a B-tree, which is stored completely on disk and can be partly cached in RAM, when using an LSM tree, memory is explicitly separated from disk right from the start. The issue of volatile memory and data persistence is beyond the scope of the storage algorithm and can be solved in various ways—for example, by logging changes.

The part of an LSM tree that’s stored in RAM is called L0 (level zero). The size of RAM is limited, so L0 is allocated a fixed amount of memory. For example, in Tarantool, the L0 size is controlled by the vinyl_memory parameter. Initially, when an LSM tree is empty, operations are written to L0. Recall that all elements are ordered by key in ascending order, and then within a single key scope, by LSN in descending order, so when a new value associated with a given key gets inserted, it’s easy to locate the older value and delete it. L0 can be structured as any container capable of storing a sorted sequence of elements. For example, in Tarantool, L0 is implemented as a B+*-tree. Lookups and insertions are standard operations for the data structure underlying L0, so I won’t dwell on those.

Sooner or later the number of elements in an LSM tree exceeds the L0 size and that’s when L0 gets written to a file on disk (called a “run”) and then cleared for storing new elements. This operation is called a “dump”.


Dumps on disk form a sequence ordered by LSN: LSN ranges in different runs don’t overlap, and the leftmost runs (at the head of the sequence) hold newer operations. Think of these runs as a pyramid, with the newest ones closer to the top. As runs keep getting dumped, the pyramid grows higher. Note that newer runs may contain deletions or replacements for existing keys. To remove older data, it’s necessary to perform garbage collection (this process is sometimes called “merge” or “compaction”) by combining several older runs into a new one. If two versions of the same key are encountered during a compaction, only the newer one is retained; however, if a key insertion is followed by a deletion, then both operations can be discarded.


The key choices determining an LSM tree’s efficiency are which runs to compact and when to compact them. Suppose an LSM tree stores a monotonically increasing sequence of keys (1, 2, 3, …,) with no deletions. In this case, compacting runs would be useless: all of the elements are sorted, the tree doesn’t have any garbage, and the location of any key can unequivocally be determined. On the other hand, if an LSM tree contains many deletions, doing a compaction would free up some disk space. However, even if there are no deletions, but key ranges in different runs overlap a lot, compacting such runs could speed up lookups as there would be fewer runs to scan. In this case, it might make sense to compact runs after each dump. But keep in mind that a compaction causes all data stored on disk to be overwritten, so with few reads it’s recommended to perform it less often.

To ensure it’s optimally configurable for any of the scenarios above, an LSM tree organizes all runs into a pyramid: the newer the data operations, the higher up the pyramid they are located. During a compaction, the algorithm picks two or more neighboring runs of approximately equal size, if possible.


  • Multi-level compaction can span any number of levels
  • A level can contain multiple runs

All of the neighboring runs of approximately equal size constitute an LSM tree level on disk. The ratio of run sizes at different levels determines the pyramid’s proportions, which allows optimizing the tree for write-intensive or read-intensive scenarios.

Suppose the L0 size is 100 Mb, the ratio of run sizes at each level (the vinyl_run_size_ratio parameter) is 5, and there can be no more than 2 runs per level (the vinyl_run_count_per_level parameter). After the first 3 dumps, the disk will contain 3 runs of 100 Mb each—which constitute L1 (level one). Since 3 > 2, the runs will be compacted into a single 300 Mb run, with the older ones being deleted. After 2 more dumps, there will be another compaction, this time of 2 runs of 100 Mb each and the 300 Mb run, which will produce one 500 Mb run. It will be moved to L2 (recall that the run size ratio is 5), leaving L1 empty. The next 10 dumps will result in L2 having 3 runs of 500 Mb each, which will be compacted into a single 1500 Mb run. Over the course of 10 more dumps, the following will happen: 3 runs of 100 Mb each will be compacted twice, as will two 100 Mb runs and one 300 Mb run, which will yield 2 new 500 Mb runs in L2. Since L2 now has 3 runs, they will also be compacted: two 500 Mb runs and one 1500 Mb run will produce a 2500 Mb run that will be moved to L3, given its size.

This can go on infinitely, but if an LSM tree contains lots of deletions, the resulting compacted run can be moved not only down, but also up the pyramid due to its size being smaller than the sizes of the original runs that were compacted. In other words, it’s enough to logically track which level a certain run belongs to, based on the run size and the smallest and greatest LSN among all of its operations.

Controlling the form of an LSM tree

If it’s necessary to reduce the number of runs for lookups, then the run size ratio can be increased, thus bringing the number of levels down. If, on the other hand, you need to minimize the compaction-related overhead, then the run size ratio can be decreased: the pyramid will grow higher, and even though runs will be compacted more often, they will be smaller, which will reduce the total amount of work done. In general, write amplification in an LSM tree is described by this formula: log_{x}(\frac {N} {L0}) × x or, alternatively, x × \frac {ln (\frac {N} {C0})} {ln(x)}, where N is the total size of all tree elements, L0 is the level zero size, and x is the level size ratio (the level_size_ratio parameter). At \frac {N} {C0} = 40 (the disk-to- memory ratio), the plot would look something like this:


As for read amplification, it’s proportional to the number of levels. The lookup cost at each level is no greater than that for a B-tree. Getting back to the example of a tree with 100,000,000 elements: given 256 Mb of RAM and the default values of vinyl_run_size_ratio and vinyl_run_count_per_level, write amplification would come out to about 13, while read amplification could be as high as 150. Let’s try to figure out why this happens.

Range searching

In the case of a single-key search, the algorithm stops after encountering the first match. However, when searching within a certain key range (for example, looking for all the users with the last name “Ivanov”), it’s necessary to scan all tree levels.


Searching within a range of [24,30)

The required range is formed the same way as when compacting several runs: the algorithm picks the key with the largest LSN out of all the sources, ignoring the other associated operations, then moves on to the next key and repeats the procedure.


Why would one store deletions? And why doesn’t it lead to a tree overflow in the case of for i=1,10000000 put(i) delete(i) end?

With regards to lookups, deletions signal the absence of a value being searched; with compactions, they clear the tree of “garbage” records with older LSNs.

While the data is in RAM only, there’s no need to store deletions. Similarly, you don’t need to keep them following a compaction if they affect, among other things, the lowest tree level, which contains the oldest dump. Indeed, if a value can’t be found at the lowest level, then it doesn’t exist in the tree.

  • We can’t delete from append-only files
  • Tombstones (delete markers) are inserted into L0 instead

Deletion, step 1: a tombstone is inserted into L0


Deletion, step 2: the tombstone passes through intermediate levels


Deletion, step 3: in the case of a major compaction, the tombstone is removed from the tree

If a deletion is known to come right after the insertion of a unique value, which is often the case when modifying a value in a secondary index, then the deletion can safely be filtered out while compacting intermediate tree levels. This optimization is implemented in vinyl.

Advantages of an LSM tree

Apart from decreasing write amplification, the approach that involves periodically dumping level L0 and compacting levels L1-Lk has a few advantages over the approach to writes adopted by B-trees:

  • Dumps and compactions write relatively large files: typically, the L0 size is 50-100 Mb, which is thousands of times larger than the size of a B-tree block.
  • This large size allows efficiently compressing data before writing it. Tarantool compresses data automatically, which further decreases write amplification.
  • There is no fragmentation overhead, since there’s no padding/empty space between the elements inside a run.
  • All operations create new runs instead of modifying older data in place. This allows avoiding those nasty locks that everyone hates so much. Several operations can run in parallel without causing any conflicts. This also simplifies making backups and moving data to replicas.
  • Storing older versions of data allows for the efficient implementation of transaction support by using multiversion concurrency control.
Disadvantages of an LSM tree and how to deal with them

One of the key advantages of the B-tree as a search data structure is its predictability: all operations take no longer than log_{B}(N) to run. Conversely, in a classical LSM tree, both read and write speeds can differ by a factor of hundreds (best case scenario) or even thousands (worst case scenario). For example, adding just one element to L0 can cause it to overflow, which can trigger a chain reaction in levels L1, L2, and so on. Lookups may find the needed element in L0 or may need to scan all of the tree levels. It’s also necessary to optimize reads within a single level to achieve speeds comparable to those of a B-tree. Fortunately, most disadvantages can be mitigated or even eliminated with additional algorithms and data structures. Let’s take a closer look at these disadvantages and how they’re dealt with in Tarantool.

Unpredictable write speed

In an LSM tree, insertions almost always affect L0 only. How do you avoid idle time when the memory area allocated for L0 is full?

Clearing L0 involves two lengthy operations: writing to disk and memory deallocation. To avoid idle time while L0 is being dumped, Tarantool uses writeaheads. Suppose the L0 size is 256 Mb. The disk write speed is 10 Mbps. Then it would take 26 seconds to dump L0. The insertion speed is 10,000 RPS, with each key having a size of 100 bytes. While L0 is being dumped, it’s necessary to reserve 26 Mb of RAM, effectively slicing the L0 size down to 230 Mb.

Tarantool does all of these calculations automatically, constantly updating the rolling average of the DBMS workload and the histogram of the disk speed. This allows using L0 as efficiently as possible and it prevents write requests from timing out. But in the case of workload surges, some wait time is still possible. That’s why we also introduced an insertion timeout (the vinyl_timeout parameter), which is set to 60 seconds by default. The write operation itself is executed in dedicated threads. The number of these threads (2 by default) is controlled by the vinyl_write_threads parameter. The default value of 2 allows doing dumps and compactions in parallel, which is also necessary for ensuring system predictability.

In Tarantool, compactions are always performed independently of dumps, in a separate execution thread. This is made possible by the append-only nature of an LSM tree: after dumps runs are never changed, and compactions simply create new runs.

Delays can also be caused by L0 rotation and the deallocation of memory dumped to disk: during a dump, L0 memory is owned by two operating system threads, a transaction processing thread and a write thread. Even though no elements are being added to the rotated L0, it can still be used for lookups. To avoid read locks when doing lookups, the write thread doesn’t deallocate the dumped memory, instead delegating this task to the transaction processor thread. Following a dump, memory deallocation itself happens instantaneously: to achieve this, L0 uses a special allocator that deallocates all of the memory with a single operation.

  • anticipatory dump
  • throttling

The dump is performed from the so-called “shadow” L0 without blocking new insertions and lookups

Unpredictable read speed

Optimizing reads is the most difficult optimization task with regards to LSM trees. The main complexity factor here is the number of levels: any optimization causes not only much slower lookups, but also tends to require significantly larger RAM resources. Fortunately, the append-only nature of LSM trees allows us to address these problems in ways that would be nontrivial for traditional data structures.

  • page index
  • bloom filters
  • tuple range cache
  • multi-level compaction
Compression and page index

In B-trees, data compression is either the hardest problem to crack or a great marketing tool—rather than something really useful. In LSM trees, compression works as follows:

During a dump or compaction all of the data within a single run is split into pages. The page size (in bytes) is controlled by the vinyl_page_size parameter and can be set separately for each index. A page doesn’t have to be exactly of vinyl_page_size size—depending on the data it holds, it can be a little bit smaller or larger. Because of this, pages never have any empty space inside.

Data is compressed by Facebook’s streaming algorithm called “zstd”. The first key of each page, along with the page offset, is added to a “page index”, which is a separate file that allows the quick retrieval of any page. After a dump or compaction, the page index of the created run is also written to disk.

All .index files are cached in RAM, which allows finding the necessary page with a single lookup in a .run file (in vinyl, this is the extension of files resulting from a dump or compaction). Since data within a page is sorted, after it’s read and decompressed, the needed key can be found using a regular binary search. Decompression and reads are handled by separate threads, and are controlled by the vinyl_read_threads parameter.

Tarantool uses a universal file format: for example, the format of a .run file is no different from that of an .xlog file (log file). This simplifies backup and recovery as well as the usage of external tools.

Bloom filters

Even though using a page index enables scanning fewer pages per run when doing a lookup, it’s still necessary to traverse all of the tree levels. There’s a special case, which involves checking if particular data is absent when scanning all of the tree levels and it’s unavoidable: I’m talking about insertions into a unique index. If the data being inserted already exists, then inserting the same data into a unique index should lead to an error. The only way to throw an error in an LSM tree before a transaction is committed is to do a search before inserting the data. Such reads form a class of their own in the DBMS world and are called “hidden” or “parasitic” reads.

Another operation leading to hidden reads is updating a value in a field on which a secondary index is defined. Secondary keys are regular LSM trees that store differently ordered data. In most cases, in order not to have to store all of the data in all of the indexes, a value associated with a given key is kept in whole only in the primary index (any index that stores both a key and a value is called “covering” or “clustered”), whereas the secondary index only stores the fields on which a secondary index is defined, and the values of the fields that are part of the primary index. Thus, each time a change is made to a value in a field on which a secondary index is defined, it’s necessary to first remove the old key from the secondary index—and only then can the new key be inserted. At update time, the old value is unknown, and it is this value that needs to be read in from the primary key “under the hood”.

For example:

update t1 set city=’Moscow’ where id=1

To minimize the number of disk reads, especially for nonexistent data, nearly all LSM trees use probabilistic data structures, and Tarantool is no exception. A classical Bloom filter is made up of several (usually 3-to-5) bit arrays. When data is written, several hash functions are calculated for each key in order to get corresponding array positions. The bits at these positions are then set to 1. Due to possible hash collisions, some bits might be set to 1 twice. We’re most interested in the bits that remain 0 after all keys have been added. When looking for an element within a run, the same hash functions are applied to produce bit positions in the arrays. If any of the bits at these positions is 0, then the element is definitely not in the run. The probability of a false positive in a Bloom filter is calculated using Bayes’ theorem: each hash function is an independent random variable, so the probability of a collision simultaneously occurring in all of the bit arrays is infinitesimal.

The key advantage of Bloom filters in Tarantool is that they’re easily configurable. The only parameter that can be specified separately for each index is called vinyl_bloom_fpr (FPR stands for “false positive ratio”) and it has the default value of 0.05, which translates to a 5% FPR. Based on this parameter, Tarantool automatically creates Bloom filters of the optimal size for partial- key and full-key searches. The Bloom filters are stored in the .index file, along with the page index, and are cached in RAM.


A lot of people think that caching is a silver bullet that can help with any performance issue. “When in doubt, add more cache”. In vinyl, caching is viewed rather as a means of reducing the overall workload and consequently, of getting a more stable response time for those requests that don’t hit the cache. vinyl boasts a unique type of cache among transactional systems called a “range tuple cache”. Unlike, say, RocksDB or MySQL, this cache doesn’t store pages, but rather ranges of index values obtained from disk, after having performed a compaction spanning all tree levels. This allows the use of caching for both single-key and key-range searches. Since this method of caching stores only hot data and not, say, pages (you may need only some data from a page), RAM is used in the most efficient way possible. The cache size is controlled by the vinyl_cache parameter.

Garbage collection control

Chances are that by now you’ve started losing focus and need a well-deserved dopamine reward. Feel free to take a break, since working through the rest of the article is going to take some serious mental effort.

An LSM tree in vinyl is just a small piece of the puzzle. Even with a single table (or so-called “space”), vinyl creates and maintains several LSM trees, one for each index. But even a single index can be comprised of dozens of LSM trees. Let’s try to understand why this might be necessary.

Recall our example with a tree containing 100,000,000 records, 100 bytes each. As time passes, the lowest LSM level may end up holding a 10 Gb run. During compaction, a temporary run of approximately the same size will be created. Data at intermediate levels takes up some space as well, since the tree may store several operations associated with a single key. In total, storing 10 Gb of actual data may require up to 30 Gb of free space: 10 Gb for the last tree level, 10 Gb for a temporary run, and 10 Gb for the remaining data. But what if the data size is not 10 Gb, but 1 Tb? Requiring that the available disk space always be several times greater than the actual data size is financially unpractical, not to mention that it may take dozens of hours to create a 1 Tb run. And in the case of an emergency shutdown or system restart, the process would have to be started from scratch.

Here’s another scenario. Suppose the primary key is a monotonically increasing sequence—for example, a time series. In this case, most insertions will fall into the right part of the key range, so it wouldn’t make much sense to do a compaction just to append a few million more records to an already huge run.

But what if writes predominantly occur in a particular region of the key range, whereas most reads take place in a different region? How do you optimize the form of the LSM tree in this case? If it’s too high, read performance is impacted; if it’s too low—write speed is reduced.

Tarantool “factorizes” this problem by creating multiple LSM trees for each index. The approximate size of each subtree may be controlled by the vinyl_range_size configuration parameter. We call such subtrees “ranges”.


Factorizing large LSM trees via ranging

  • Ranges reflect a static layout of sorted runs
  • Slices connect a sorted run into a range

Initially, when the index has few elements, it consists of a single range. As more elements are added, its total size may exceed the maximum range size. In that case a special operation called “split” divides the tree into two equal parts. The tree is split at the middle element in the range of keys stored in the tree. For example, if the tree initially stores the full range of -inf…+inf, then after splitting it at the middle key X, we get two subtrees: one that stores the range of -inf…X, and the other storing the range of X…+inf. With this approach, we always know which subtree to use for writes and which one for reads. If the tree contained deletions and each of the neighboring ranges grew smaller as a result, the opposite operation called “coalesce” combines two neighboring trees into one.

Split and coalesce don’t entail a compaction, the creation of new runs, or other resource-intensive operations. An LSM tree is just a collection of runs. vinyl has a special metadata log that helps keep track of which run belongs to which subtree(s). This has the .vylog extension and its format is compatible with an .xlog file. Similarly to an .xlog file, the metadata log gets rotated at each checkpoint. To avoid the creation of extra runs with split and coalesce, we have also introduced an auxiliary entity called “slice”. It’s a reference to a run containing a key range and it’s stored only in the metadata log. Once the reference counter drops to zero, the corresponding file gets removed. When it’s necessary to perform a split or to coalesce, Tarantool creates slice objects for each new tree, removes older slices, and writes these operations to the metadata log, which literally stores records that look like this: <tree id, slice id> or <slice id, run id, min, max>.

This way all of the heavy lifting associated with splitting a tree into two subtrees is postponed until a compaction and then is performed automatically. A huge advantage of dividing all of the keys into ranges is the ability to independently control the L0 size as well as the dump and compaction processes for each subtree, which makes these processes manageable and predictable. Having a separate metadata log also simplifies the implementation of both “truncate” and “drop”. In vinyl, they’re processed instantly, since they only work with the metadata log, while garbage collection is done in the background.

Advanced features of vinyl

In the previous sections, we mentioned only two operations stored by an LSM tree: deletion and replacement. Let’s take a look at how all of the other operations can be represented. An insertion can be represented via a replacement—you just need to make sure there are no other elements with the specified key. To perform an update, it’s necessary to read the older value from the tree, so it’s easier to represent this operation as a replacement as well—this speeds up future read requests by the key. Besides, an update must return the new value, so there’s no avoiding hidden reads.

In B-trees, the cost of hidden reads is negligible: to update a block, it first needs to be read from disk anyway. Creating a special update operation for an LSM tree that doesn’t cause any hidden reads is really tempting.

Such an operation must contain not only a default value to be inserted if a key has no value yet, but also a list of update operations to perform if a value does exist.

At transaction execution time, Tarantool just saves the operation in an LSM tree, then “executes” it later, during a compaction.

The upsert operation:

space:upsert(tuple, {{operator, field, value}, ... })
  • Non-reading update or insert
  • Delayed execution
  • Background upsert squashing prevents upserts from piling up

Unfortunately, postponing the operation execution until a compaction doesn’t leave much leeway in terms of error handling. That’s why Tarantool tries to validate upserts as fully as possible before writing them to an LSM tree. However, some checks are only possible with older data on hand, for example when the update operation is trying to add a number to a string or to remove a field that doesn’t exist.

A semantically similar operation exists in many products including PostgreSQL and MongoDB. But anywhere you look, it’s just syntactic sugar that combines the update and replace operations without avoiding hidden reads. Most probably, the reason is that LSM trees as data storage structures are relatively new.

Even though an upsert is a very important optimization and implementing it cost us a lot of blood, sweat, and tears, we must admit that it has limited applicability. If a table contains secondary keys or triggers, hidden reads can’t be avoided. But if you have a scenario where secondary keys are not required and the update following the transaction completion will certainly not cause any errors, then the operation is for you.

I’d like to tell you a short story about an upsert. It takes place back when vinyl was only beginning to “mature” and we were using an upsert in production for the first time. We had what seemed like an ideal environment for it: we had tons of keys, the current time was being used as values; update operations were inserting keys or modifying the current time; and we had few reads. Load tests yielded great results.

Nevertheless, after a couple of days, the Tarantool process started eating up 100% of our CPU, and the system performance dropped close to zero.

We started digging into the issue and found out that the distribution of requests across keys was significantly different from what we had seen in the test environment. It was…well, quite nonuniform. Most keys were updated once or twice a day, so the database was idle for the most part, but there were much hotter keys with tens of thousands of updates per day. Tarantool handled those just fine. But in the case of lookups by key with tens of thousands of upserts, things quickly went downhill. To return the most recent value, Tarantool had to read and “replay” the whole history consisting of all of the upserts. When designing upserts, we had hoped this would happen automatically during a compaction, but the process never even got to that stage: the L0 size was more than enough, so there were no dumps.

We solved the problem by adding a background process that performed readaheads on any keys that had more than a few dozen upserts piled up, so all those upserts were squashed and substituted with the read value.

Secondary keys

Update is not the only operation where optimizing hidden reads is critical. Even the replace operation, given secondary keys, has to read the older value: it needs to be independently deleted from the secondary indexes, and inserting a new element might not do this, leaving some garbage behind.


If secondary indexes are not unique, then collecting “garbage” from them can be put off until a compaction, which is what we do in Tarantool. The append-only nature of LSM trees allowed us to implement full-blown serializable transactions in vinyl. Read-only requests use older versions of data without blocking any writes. The transaction manager itself is fairly simple for now: in classical terms, it implements the MVTO (multiversion timestamp ordering) class, whereby the winning transaction is the one that finished earlier. There are no locks and associated deadlocks. Strange as it may seem, this is a drawback rather than an advantage: with parallel execution, you can increase the number of successful transactions by simply holding some of them on lock when necessary. We’re planning to improve the transaction manager soon. In the current release, we focused on making the algorithm behave 100% correctly and predictably. For example, our transaction manager is one of the few on the NoSQL market that supports so-called “gap locks”.

Tarantool Cartridge

Cluster management in Tarantool is powered by the Tarantool Cartridge framework.

Here we explain how you can benefit with Tarantool Cartridge, a framework for developing, deploying, and managing applications based on Tarantool.

This documentation contains the following sections:

Tarantool Cartridge

A framework for distributed applications development.

About Tarantool Cartridge

Tarantool Cartridge allows you to easily develop Tarantool-based applications and run them on one or more Tarantool instances organized into a cluster.

This is the recommended alternative to the old-school practices of application development for Tarantool.

As a software development kit (SDK), Tarantool Cartridge provides you with utilities and an application template to help:

  • easily set up a development environment for your applications;
  • plug the necessary Lua modules.

The resulting package can be installed and started on one or multiple servers as one or multiple instantiated services – independent or organized into a cluster.

A Tarantool cluster is a collection of Tarantool instances acting in concert. While a single Tarantool instance can leverage the performance of a single server and is vulnerable to failure, the cluster spans multiple servers, utilizes their cumulative CPU power, and is fault-tolerant.

To fully utilize the capabilities of a Tarantool cluster, you need to develop applications keeping in mind they are to run in a cluster environment.

As a cluster management tool, Tarantool Cartridge provides your cluster-aware applications with the following key benefits:

  • horizontal scalability and load balancing via built-in automatic sharding;
  • asynchronous replication;
  • automatic failover;
  • centralized cluster control via GUI or API;
  • automatic configuration synchronization;
  • instance functionality segregation.

A Tarantool Cartridge cluster can segregate functionality between instances via built-in and custom (user-defined) cluster roles. You can toggle instances on and off on the fly during cluster operation. This allows you to put different types of workloads (e.g., compute- and transaction-intensive ones) on different physical servers with dedicated hardware.

Tarantool Cartridge has an external utility called cartridge-cli which provides you with utilities and an application template to help:

  • easily set up a development environment for your applications;
  • plug the necessary Lua modules;
  • pack the applications in an environment-independent way: together with module binaries and Tarantool executables.

Getting started


To get a template application that uses Tarantool Cartridge and run it, you need to install several packages:

  • tarantool and tarantool-dev (see these instructions);
  • cartridge-cli (see these instructions)
  • git, gcc, cmake and make.
Create your first application

Long story short, copy-paste this into the console:

cartridge create --name myapp
cd myapp
cartridge build
cartridge start -d
cartridge replicasets setup --bootstrap-vshard

That’s all! Now you can visit http://localhost:8081 and see your application’s Admin Web UI:



The most essential contribution is your feedback, don’t hesitate to open an issue. If you’d like to propose some changes in code, see the contribution guide.

Developer’s guide

For a quick start, skip the details below and jump right away to the Cartridge getting started guide.

For a deep dive into what you can develop with Tarantool Cartridge, go on with the Cartridge developer’s guide.


To develop and start an application, in short, you need to go through the following steps:

  1. Install Tarantool Cartridge and other components of the development environment.
  2. Create a project.
  3. Develop the application. In case it is a cluster-aware application, implement its logic in a custom (user-defined) cluster role to initialize the database in a cluster environment.
  4. Deploy the application to target server(s). This includes configuring and starting the instance(s).
  5. In case it is a cluster-aware application, deploy the cluster.

The following sections provide details for each of these steps.

Installing Tarantool Cartridge

  1. Install cartridge-cli, a command-line tool for developing, deploying, and managing Tarantool applications.
  2. Install git, a version control system.
  3. Install npm, a package manager for node.js.
  4. Install the unzip utility.

Creating a project

To set up your development environment, create a project using the Tarantool Cartridge project template. In any directory, say:

$ cartridge create --name <app_name> /path/to/

This will automatically set up a Git repository in a new /path/to/<app_name>/ directory, tag it with version 0.1.0, and put the necessary files into it.

In this Git repository, you can develop the application (by simply editing the default files provided by the template), plug the necessary modules, and then easily pack everything to deploy on your server(s).

The project template creates the <app_name>/ directory with the following contents:

  • <app_name>-scm-1.rockspec file where you can specify the application dependencies.
  • deps.sh script that resolves dependencies from the .rockspec file.
  • init.lua file which is the entry point for your application.
  • .git file necessary for a Git repository.
  • .gitignore file to ignore the unnecessary files.
  • env.lua file that sets common rock paths so that the application can be started from any directory.
  • custom-role.lua file that is a placeholder for a custom (user-defined) cluster role.

The entry point file (init.lua), among other things, loads the cartridge module and calls its initialization function:

local cartridge = require('cartridge')
-- cartridge options example
  workdir = '/var/lib/tarantool/app',
  advertise_uri = 'localhost:3301',
  cluster_cookie = 'super-cluster-cookie',
}, {
-- box options example
  memtx_memory = 1000000000,
  ... })

The cartridge.cfg() call renders the instance operable via the administrative console but does not call box.cfg() to configure instances.


Calling the box.cfg() function is forbidden.

The cluster itself will do it for you when it is time to:

  • bootstrap the current instance once you:
    • run cartridge.bootstrap() via the administrative console, or
    • click Create in the web interface;
  • join the instance to an existing cluster once you:
    • run cartridge.join_server({uri = 'other_instance_uri'}) via the console, or
    • click Join (an existing replica set) or Create (a new replica set) in the web interface.

Notice that you can specify a cookie for the cluster (cluster_cookie parameter) if you need to run several clusters in the same network. The cookie can be any string value.

Now you can develop an application that will run on a single or multiple independent Tarantool instances (e.g. acting as a proxy to third-party databases) – or will run in a cluster.

If you plan to develop a cluster-aware application, first familiarize yourself with the notion of cluster roles.

Cluster roles

Cluster roles are Lua modules that implement some specific functions and/or logic. In other words, a Tarantool Cartridge cluster segregates instance functionality in a role-based way.

Since all instances running cluster applications use the same source code and are aware of all the defined roles (and plugged modules), you can dynamically enable and disable multiple different roles without restarts, even during cluster operation.

Note that every instance in a replica set performs the same roles and you cannot enable/disable roles individually on some instances. In other words, configuration of enabled roles is set up per replica set. See a step-by-step configuration example in this guide.

Built-in roles

The cartridge module comes with two built-in roles that implement automatic sharding:

  • vshard-router that handles the vshard’s compute-intensive workload: routes requests to storage nodes.

  • vshard-storage that handles the vshard’s transaction-intensive workload: stores and manages a subset of a dataset.


    For more information on sharding, see the vshard module documentation.

With the built-in and custom roles, you can develop applications with separated compute and transaction handling – and enable relevant workload-specific roles on different instances running on physical servers with workload-dedicated hardware.

Custom roles

You can implement custom roles for any purposes, for example:

  • define stored procedures;
  • implement extra features on top of vshard;
  • go without vshard at all;
  • implement one or multiple supplementary services such as e-mail notifier, replicator, etc.

To implement a custom cluster role, do the following:

  1. Take the app/roles/custom.lua file in your project as a sample. Rename this file as you wish, e.g. app/roles/custom-role.lua, and implement the role’s logic. For example:

    -- Implement a custom role in app/roles/custom-role.lua
    #!/usr/bin/env tarantool
    local role_name = 'custom-role'
    local function init()
    local function stop()
    return {
        role_name = role_name,
        init = init,
        stop = stop,

    Here the role_name value may differ from the module name passed to the cartridge.cfg() function. If the role_name variable is not specified, the module name is the default value.


    Role names must be unique as it is impossible to register multiple roles with the same name.

  2. Register the new role in the cluster by modifying the cartridge.cfg() call in the init.lua entry point file:

    -- Register a custom role in init.lua
    local cartridge = require('cartridge')
      workdir = ...,
      advertise_uri = ...,
      roles = {'custom-role'},

    where custom-role is the name of the Lua module to be loaded.

The role module does not have required functions, but the cluster may execute the following ones during the role’s life cycle:

  • init() is the role’s initialization function.

    Inside the function’s body you can call any box functions: create spaces, indexes, grant permissions, etc. Here is what the initialization function may look like:

    local function init(opts)
        -- The cluster passes an 'opts' Lua table containing an 'is_master' flag.
        if opts.is_master then
            local customer = box.schema.space.create('customer',
                { if_not_exists = true }
                {'customer_id', 'unsigned'},
                {'bucket_id', 'unsigned'},
                {'name', 'string'},
            customer:create_index('customer_id', {
                parts = {'customer_id'},
                if_not_exists = true,


    • Neither vshard-router nor vshard-storage manage spaces, indexes, or formats. You should do it within a custom role: add a box.schema.space.create() call to your first cluster role, as shown in the example above.
    • The function’s body is wrapped in a conditional statement that lets you call box functions on masters only. This protects against replication collisions as data propagates to replicas automatically.
  • stop() is the role’s termination function. Implement it if initialization starts a fiber that has to be stopped or does any job that needs to be undone on termination.

  • validate_config() and apply_config() are functions that validate and apply the role’s configuration. Implement them if some configuration data needs to be stored cluster-wide.

Next, get a grip on the role’s life cycle to implement the functions you need.

Defining role dependencies

You can instruct the cluster to apply some other roles if your custom role is enabled.

For example:

-- Role dependencies defined in app/roles/custom-role.lua
local role_name = 'custom-role'
return {
    role_name = role_name,
    dependencies = {'cartridge.roles.vshard-router'},

Here vshard-router role will be initialized automatically for every instance with custom-role enabled.

Using multiple vshard storage groups

Replica sets with vshard-storage roles can belong to different groups. For example, hot or cold groups meant to independently process hot and cold data.

Groups are specified in the cluster’s configuration:

-- Specify groups in init.lua
    vshard_groups = {'hot', 'cold'},

If no groups are specified, the cluster assumes that all replica sets belong to the default group.

With multiple groups enabled, every replica set with a vshard-storage role enabled must be assigned to a particular group. The assignment can never be changed.

Another limitation is that you cannot add groups dynamically (this will become available in future).

Finally, mind the syntax for router access. Every instance with a vshard-router role enabled initializes multiple routers. All of them are accessible through the role:

local router_role = cartridge.service_get('vshard-router')

If you have no roles specified, you can access a static router as before (when Tarantool Cartridge was unaware of groups):

local vhsard = require('vshard')

However, when using the current group-aware API, you must call a static router with a colon:

local router_role = cartridge.service_get('vshard-router')
local default_router = router_role.get() -- or router_role.get('default')
Role’s life cycle (and the order of function execution)

The cluster displays the names of all custom roles along with the built-in vshard-* roles in the web interface. Cluster administrators can enable and disable them for particular instances – either via the web interface or via the cluster public API. For example:

cartridge.admin.edit_replicaset('replicaset-uuid', {roles = {'vshard-router', 'custom-role'}})

If you enable multiple roles on an instance at the same time, the cluster first initializes the built-in roles (if any) and then the custom ones (if any) in the order the latter were listed in cartridge.cfg().

If a custom role has dependent roles, the dependencies are registered and validated first, prior to the role itself.

The cluster calls the role’s functions in the following circumstances:

  • The init() function, typically, once: either when the role is enabled by the administrator or at the instance restart. Enabling a role once is normally enough.
  • The stop() function – only when the administrator disables the role, not on instance termination.
  • The validate_config() function, first, before the automatic box.cfg() call (database initialization), then – upon every configuration update.
  • The apply_config() function upon every configuration update.

As a tryout, let’s task the cluster with some actions and see the order of executing the role’s functions:

  • Join an instance or create a replica set, both with an enabled role:
    1. validate_config()
    2. init()
    3. apply_config()
  • Restart an instance with an enabled role:
    1. validate_config()
    2. init()
    3. apply_config()
  • Disable role: stop().
  • Upon the cartridge.confapplier.patch_clusterwide() call:
    1. validate_config()
    2. apply_config()
  • Upon a triggered failover:
    1. validate_config()
    2. apply_config()

Considering the described behavior:

  • The init() function may:
    • Call box functions.
    • Start a fiber and, in this case, the stop() function should take care of the fiber’s termination.
    • Configure the built-in HTTP server.
    • Execute any code related to the role’s initialization.
  • The stop() functions must undo any job that needs to be undone on role’s termination.
  • The validate_config() function must validate any configuration change.
  • The apply_config() function may execute any code related to a configuration change, e.g., take care of an expirationd fiber.

The validation and application functions together allow you to change the cluster-wide configuration as described in the next section.

Configuring custom roles

You can:

  • Store configurations for your custom roles as sections in cluster-wide configuration, for example:

    # in YAML configuration file
      notify_url: "https://localhost:8080"
    -- in init.lua file
    local notify_url = 'http://localhost'
    function my_role.apply_config(conf, opts)
      local conf = conf['my_role'] or {}
      notify_url = conf.notify_url or 'default'
  • Download and upload cluster-wide configuration using the web interface or API (via GET/PUT queries to admin/config endpoint like curl localhost:8081/admin/config and curl -X PUT -d "{'my_parameter': 'value'}" localhost:8081/admin/config).

  • Utilize it in your role’s apply_config() function.

Every instance in the cluster stores a copy of the configuration file in its working directory (configured by cartridge.cfg({workdir = ...})):

  • /var/lib/tarantool/<instance_name>/config.yml for instances deployed from RPM packages and managed by systemd.
  • /home/<username>/tarantool_state/var/lib/tarantool/config.yml for instances deployed from tar+gz archives.

The cluster’s configuration is a Lua table, downloaded and uploaded as YAML. If some application-specific configuration data, e.g. a database schema as defined by DDL (data definition language), needs to be stored on every instance in the cluster, you can implement your own API by adding a custom section to the table. The cluster will help you spread it safely across all instances.

Such section goes in the same file with topology-specific and vshard-specific sections that the cluster generates automatically. Unlike the generated, the custom section’s modification, validation, and application logic has to be defined.

The common way is to define two functions:

  • validate_config(conf_new, conf_old) to validate changes made in the new configuration (conf_new) versus the old configuration (conf_old).
  • apply_config(conf, opts) to execute any code related to a configuration change. As input, this function takes the configuration to apply (conf, which is actually the new configuration that you validated earlier with validate_config()) and options (the opts argument that includes is_master, a Boolean flag described later).


The validate_config() function must detect all configuration problems that may lead to apply_config() errors. For more information, see the next section.

When implementing validation and application functions that call box ones for some reason, mind the following precautions:

  • Due to the role’s life cycle, the cluster does not guarantee an automatic box.cfg() call prior to calling validate_config().

    If the validation function calls any box functions (e.g., to check a format), make sure the calls are wrapped in a protective conditional statement that checks if box.cfg() has already happened:

    -- Inside the validate_config() function:
    if type(box.cfg) == 'table' then
        -- Here you can call box functions
  • Unlike the validation function, apply_config() can call box functions freely as the cluster applies custom configuration after the automatic box.cfg() call.

    However, creating spaces, users, etc., can cause replication collisions when performed on both master and replica instances simultaneously. The appropriate way is to call such box functions on masters only and let the changes propagate to replicas automatically.

    Upon the apply_config(conf, opts) execution, the cluster passes an is_master flag in the opts table which you can use to wrap collision-inducing box functions in a protective conditional statement:

    -- Inside the apply_config() function:
    if opts.is_master then
        -- Here you can call box functions
Custom configuration example

Consider the following code as part of the role’s module (custom-role.lua) implementation:

#!/usr/bin/env tarantool
-- Custom role implementation

local cartridge = require('cartridge')

local role_name = 'custom-role'

-- Modify the config by implementing some setter (an alternative to HTTP PUT)
local function set_secret(secret)
    local custom_role_cfg = cartridge.confapplier.get_deepcopy(role_name) or {}
    custom_role_cfg.secret = secret
        [role_name] = custom_role_cfg,
-- Validate
local function validate_config(cfg)
    local custom_role_cfg = cfg[role_name] or {}
    if custom_role_cfg.secret ~= nil then
        assert(type(custom_role_cfg.secret) == 'string', 'custom-role.secret must be a string')
    return true
-- Apply
local function apply_config(cfg)
    local custom_role_cfg = cfg[role_name] or {}
    local secret = custom_role_cfg.secret or 'default-secret'
    -- Make use of it

return {
    role_name = role_name,
    set_secret = set_secret,
    validate_config = validate_config,
    apply_config = apply_config,

Once the configuration is customized, do one of the following:

Applying custom role’s configuration

With the implementation showed by the example, you can call the set_secret() function to apply the new configuration via the administrative console – or an HTTP endpoint if the role exports one.

The set_secret() function calls cartridge.confapplier.patch_clusterwide() which performs a two-phase commit:

  1. It patches the active configuration in memory: copies the table and replaces the "custom-role" section in the copy with the one given by the set_secret() function.
  2. The cluster checks if the new configuration can be applied on all instances except disabled and expelled. All instances subject to update must be healthy and alive according to the membership module.
  3. (Preparation phase) The cluster propagates the patched configuration. Every instance validates it with the validate_config() function of every registered role. Depending on the validation’s result:
    • If successful (i.e., returns true), the instance saves the new configuration to a temporary file named config.prepare.yml within the working directory.
    • (Abort phase) Otherwise, the instance reports an error and all the other instances roll back the update: remove the file they may have already prepared.
  4. (Commit phase) Upon successful preparation of all instances, the cluster commits the changes. Every instance:
    1. Creates the active configuration’s hard-link.
    2. Atomically replaces the active configuration file with the prepared one. The atomic replacement is indivisible – it can either succeed or fail entirely, never partially.
    3. Calls the apply_config() function of every registered role.

If any of these steps fail, an error pops up in the web interface next to the corresponding instance. The cluster does not handle such errors automatically, they require manual repair.

You will avoid the repair if the validate_config() function can detect all configuration problems that may lead to apply_config() errors.

Using the built-in HTTP server

The cluster launches an httpd server instance during initialization (cartridge.cfg()). You can bind a port to the instance via an environmental variable:

-- Get the port from an environmental variable or the default one:
local http_port = os.getenv('HTTP_PORT') or '8080'

local ok, err = cartridge.cfg({
   -- Pass the port to the cluster:
   http_port = http_port,

To make use of the httpd instance, access it and configure routes inside the init() function of some role, e.g. a role that exposes API over HTTP:

local function init(opts)


   -- Get the httpd instance:
   local httpd = cartridge.service_get('httpd')
   if httpd ~= nil then
       -- Configure a route to, for example, metrics:
               method = 'GET',
               path = '/metrics',
               public = true,
               return req:render({json = stat.stat()})

For more information on using Tarantool’s HTTP server, see its documentation.

Implementing authorization in the web interface

To implement authorization in the web interface of every instance in a Tarantool cluster:

  1. Implement a new, say, auth module with a check_password function. It should check the credentials of any user trying to log in to the web interface.

    The check_password function accepts a username and password and returns an authentication success or failure.

    -- auth.lua
    -- Add a function to check the credentials
    local function check_password(username, password)
        -- Check the credentials any way you like
        -- Return an authentication success or failure
        if not ok then
            return false
        return true
  2. Pass the implemented auth module name as a parameter to cartridge.cfg(), so the cluster can use it:

    -- init.lua
    local ok, err = cartridge.cfg({
        auth_backend_name = 'auth',
        -- The cluster will automatically call 'require()' on the 'auth' module.

    This adds a Log in button to the upper right corner of the web interface but still lets the unsigned users interact with the interface. This is convenient for testing.


    Also, to authorize requests to cluster API, you can use the HTTP basic authorization header.

  3. To require the authorization of every user in the web interface even before the cluster bootstrap, add the following line:

    -- init.lua
    local ok, err = cartridge.cfg({
        auth_backend_name = 'auth',
        auth_enabled = true,

    With the authentication enabled and the auth module implemented, the user will not be able to even bootstrap the cluster without logging in. After the successful login and bootstrap, the authentication can be enabled and disabled cluster-wide in the web interface and the auth_enabled parameter is ignored.

Application versioning

Tarantool Cartridge understands semantic versioning as described at semver.org. When developing an application, create new Git branches and tag them appropriately. These tags are used to calculate version increments for subsequent packing.

For example, if your application has version 1.2.1, tag your current branch with 1.2.1 (annotated or not).

To retrieve the current version from Git, say:

$ git describe --long --tags

This output shows that we are 12 commits after the version 1.2.1. If we are to package the application at this point, it will have a full version of 1.2.1-12 and its package will be named <app_name>-1.2.1-12.rpm.

Non-semantic tags are prohibited. You will not be able to create a package from a branch with the latest tag being non-semantic.

Once you package your application, the version is saved in a VERSION file in the package root.

Using .cartridge.ignore files

You can add a .cartridge.ignore file to your application repository to exclude particular files and/or directories from package builds.

For the most part, the logic is similar to that of .gitignore files. The major difference is that in .cartridge.ignore files the order of exceptions relative to the rest of the templates does not matter, while in .gitignore files the order does matter.

.cartridge.ignore entry ignores every…
target/ folder (due to the trailing /) named target, recursively
target file or folder named target, recursively
/target file or folder named target in the top-most directory (due to the leading /)
/target/ folder named target in the top-most directory (leading and trailing /)
*.class every file or folder ending with .class, recursively
#comment nothing, this is a comment (the first character is a #)
\#comment every file or folder with name #comment (\ for escaping)
target/logs/ every folder named logs which is a subdirectory of a folder named target
target/*/logs/ every folder named logs two levels under a folder named target (* doesn’t include /)
target/**/logs/ every folder named logs somewhere under a folder named target (** includes /)
*.py[co] every file or folder ending in .pyc or .pyo; however, it doesn’t match .py!
*.py[!co] every file or folder ending in anything other than c or o
*.file[0-9] every file or folder ending in digit
*.file[!0-9] every file or folder ending in anything other than digit
* every
/* everything in the top-most directory (due to the leading /)
**/*.tar.gz every *.tar.gz file or folder which is one or more levels under the starting folder
!file every file or folder will be ignored even if it matches other patterns

Failover architecture

An important concept in cluster topology is appointing a leader. Leader is an instance which is responsible for performing key operations. To keep things simple, you can think of a leader as of the only writable master. Every replica set has its own leader, and there’s usually not more than one.

Which instance will become a leader depends on topology settings and failover configuration.

An important topology parameter is the failover priority within a replica set. This is an ordered list of instances. By default, the first instance in the list becomes a leader, but with the failover enabled it may be changed automatically if the first one is malfunctioning.

Instance configuration upon a leader change

When Cartridge configures roles, it takes into account the leadership map (consolidated in the failover.lua module). The leadership map is composed when the instance enters the ConfiguringRoles state for the first time. Later the map is updated according to the failover mode.

Every change in the leadership map is accompanied by instance re-configuration. When the map changes, Cartridge updates the read_only setting and calls the apply_config callback for every role. It also specifies the is_master flag (which actually means is_leader, but hasn’t been renamed yet due to historical reasons).

It’s important to say that we discuss a distributed system where every instance has its own opinion. Even if all opinions coincide, there still may be races between instances, and you (as an application developer) should take them into account when designing roles and their interaction.

Leader appointment rules

The logic behind leader election depends on the failover mode: disabled, eventual, or stateful.

Disabled mode

This is the simplest case. The leader is always the first instance in the failover priority. No automatic switching is performed. When it’s dead, it’s dead.

Eventual failover

In the eventual mode, the leader isn’t elected consistently. Instead, every instance in the cluster thinks that the leader is the first healthy instance in the failover priority list, while instance health is determined according to the membership status (the SWIM protocol).

The member is considered healthy if both are true:

  1. It reports either ConfiguringRoles or RolesConfigured state;
  2. Its SWIM status is either alive or suspect.

A suspect member becomes dead after the failover_timout expires.

Leader election is done as follows. Suppose there are two replica sets in the cluster:

  • a single router “R”,
  • two storages, “S1” and “S2”.

Then we can say: all the three instances (R, S1, S2) agree that S1 is the leader.

The SWIM protocol guarantees that eventually all instances will find a common ground, but it’s not guaranteed for every intermediate moment of time. So we may get a conflict.

For example, soon after S1 goes down, R is already informed and thinks that S2 is the leader, but S2 hasn’t received the gossip yet and still thinks he’s not. This is a conflict.

Similarly, when S1 recovers and takes the leadership, S2 may be unaware of that yet. So, both S1 and S2 consider themselves as leaders.

Moreover, SWIM protocol isn’t perfect and still can produce false-negative gossips (announce the instance is dead when it’s not).

Stateful failover

Similarly to the eventual mode, every instance composes its own leadership map, but now the map is fetched from an external state provider (that’s why this failover mode called “stateful”). Nowadays there are two state providers supported – etcd and stateboard (standalone Tarantool instance). State provider serves as a domain-specific key-value storage (simply replicaset_uuid -> leader_uuid) and a locking mechanism.

Changes in the leadership map are obtained from the state provider with the long polling technique.

All decisions are made by the coordinator – the one that holds the lock. The coordinator is implemented as a built-in Cartridge role. There may be many instances with the coordinator role enabled, but only one of them can acquire the lock at the same time. We call this coordinator the “active” one.

The lock is released automatically when the TCP connection is closed, or it may expire if the coordinator becomes unresponsive (in stateboard it’s set by the stateboard’s --lock_delay option, for etcd it’s a part of clusterwide configuration), so the coordinator renews the lock from time to time in order to be considered alive.

The coordinator makes a decision based on the SWIM data, but the decision algorithm is slightly different from that in case of eventual failover:

  • Right after acquiring the lock from the state provider, the coordinator fetches the leadership map.
  • If there is no leader appointed for the replica set, the coordinator appoints the first leader according to the failover priority, regardless of the SWIM status.
  • If a leader becomes dead, the coordinator makes a decision. A new leader is the first healthy instance from the failover priority list. If an old leader recovers, no leader change is made until the current leader down. Changing failover priority doesn’t affect this.
  • Every appointment (self-made or fetched) is immune for a while (controlled by the IMMUNITY_TIMEOUT option).
The case: external provider outage

In this case instances do nothing: the leader remains a leader, read-only instances remain read-only. If any instance restarts during an external state provider outage, it composes an empty leadership map: it doesn’t know who actually is a leader and thinks there is none.

The case: coordinator outage

An active coordinator may be absent in a cluster either because of a failure or due to disabling the role everywhere. Just like in the previous case, instances do nothing about it: they keep fetching the leadership map from the state provider. But it will remain the same until a coordinator appears.

Manual leader promotion

It differs a lot depending on the failover mode.

In the disabled and eventual modes, you can only promote a leader by changing the failover priority (and applying a new clusterwide configuration).

In the stateful mode, the failover priority doesn’t make much sense (except for the first appointment). Instead, you should use the promotion API (the Lua cartridge.failover_promote or the GraphQL mutation {cluster{failover_promote()}}) which pushes manual appointments to the state provider.

The stateful failover mode implies consistent promotion: before becoming writable, each instance performs the wait_lsn operation to sync up with the previous one.

Information about the previous leader (we call it a vclockkeeper) is also stored on the external storage. Even when the old leader is demoted, it remains the vclockkeeper until the new leader successfully awaits and persists its vclock on the external storage.

If replication is stuck and consistent promotion isn’t possible, a user has two options: to revert promotion (to re-promote the old leader) or to force it inconsistently (all kinds of failover_promote API has force_inconsistency flag).

Consistent promotion doesn’t work for replicasets with all_rw flag enabled and for single-instance replicasets. In these two cases an instance doesn’t even try to query vclockkeeper and to perform wait_lsn. But the coordinator still appoints a new leader if the current one dies.


Neither eventual nor stateful failover modes don’t protect a replicaset from the presence of multiple leaders when the network is partitioned. But fencing does. It enforces at-most-one leader policy in a replicaset.

Fencing operates as a fiber that occasionally checks connectivity with the state provider and with replicas. Fencing fiber runs on vclockkeepers; it starts right after consistent promotion succeeds. Replicasets which don’t need consistency (single-instance and all_rw) don’t defense, though.

The condition for fencing actuation is the loss of both the state provider quorum and at least one replica. Otherwise, if either state provider is healthy or all replicas are alive, the fencing fiber waits and doesn’t intervene.

When fencing is actuated, it generates a fake appointment locally and sets the leader to nil. Consequently, the instance becomes read-only. Subsequent recovery is only possible when the quorum reestablishes; replica connection isn’t a must for recovery. Recovery is performed according to the rules of consistent switchover unless some other instance has already been promoted to a new leader.

Failover configuration

These are clusterwide parameters:

  • mode: “disabled” / “eventual” / “stateful”.
  • state_provider: “tarantool” / “etcd”.
  • failover_timeout – time (in seconds) to mark suspect members as dead and trigger failover (default: 20).
  • tarantool_params: {uri = "...", password = "..."}.
  • etcd2_params: {endpoints = {...}, prefix = "/", lock_delay = 10, username = "", password = ""}.
  • fencing_enabled: true / false (default: false).
  • fencing_timeout – time to actuate fencing after the check fails (default: 10).
  • fencing_pause – the period of performing the check (default: 2).

It’s required that failover_timeout > fencing_timeout >= fencing_pause.


Use your favorite GraphQL client (e.g. Altair) for requests introspection:

  • query {cluster{failover_params{}}},
  • mutation {cluster{failover_params(){}}},
  • mutation {cluster{failover_promote()}}.
Stateboard configuration

Like other Cartridge instances, the stateboard supports cartridge.argprase options:

  • listen
  • workdir
  • password
  • lock_delay

Similarly to other argparse options, they can be passed via command-line arguments or via environment variables, e.g.:

.rocks/bin/stateboard --workdir ./dev/stateboard --listen 4401 --password qwerty
Fine-tuning failover behavior

Besides failover priority and mode, there are some other private options that influence failover operation:

  • LONGPOLL_TIMEOUT (failover) – the long polling timeout (in seconds) to fetch new appointments (default: 30);
  • NETBOX_CALL_TIMEOUT (failover/coordinator) – stateboard client’s connection timeout (in seconds) applied to all communications (default: 1);
  • RECONNECT_PERIOD (coordinator) – time (in seconds) to reconnect to the state provider if it’s unreachable (default: 5);
  • IMMUNITY_TIMEOUT (coordinator) – minimal amount of time (in seconds) to wait before overriding an appointment (default: 15).

Configuring instances

Cartridge orchestrates a distributed system of Tarantool instances – a cluster. One of the core concepts is clusterwide configuration. Every instance in a cluster stores a copy of it.

Clusterwide configuration contains options that must be identical on every cluster node, such as the topology of the cluster, failover and vshard configuration, authentication parameters and ACLs, and user-defined configuration.

Clusterwide configuration doesn’t provide instance-specific parameters: ports, workdirs, memory settings, etc.

Configuration basics

Instance configuration includes two sets of parameters:

You can set any of these parameters in:

  1. Command line arguments.
  2. Environment variables.
  3. YAML configuration file.
  4. init.lua file.

The order here indicates the priority: command-line arguments override environment variables, and so forth.

No matter how you start the instances, you need to set the following cartridge.cfg() parameters for each instance:

  • advertise_uri – either <HOST>:<PORT>, or <HOST>:, or <PORT>. Used by other instances to connect to the current one. DO NOT specify – this must be an external IP address, not a socket bind.
  • http_port – port to open administrative web interface and API on. Defaults to 8081. To disable it, specify "http_enabled": False.
  • workdir – a directory where all data will be stored: snapshots, wal logs, and cartridge configuration file. Defaults to ..

If you start instances using cartridge CLI or systemctl, save the configuration as a YAML file, for example:

my_app.router: {"advertise_uri": "localhost:3301", "http_port": 8080}
my_app.storage_A: {"advertise_uri": "localhost:3302", "http_enabled": False}
my_app.storage_B: {"advertise_uri": "localhost:3303", "http_enabled": False}

With cartridge CLI, you can pass the path to this file as the --cfg command-line argument to the cartridge start command – or specify the path in cartridge CLI configuration (in ./.cartridge.yml or ~/.cartridge.yml):

cfg: cartridge.yml
run_dir: tmp/run
apps_path: /usr/local/share/tarantool

With systemctl, save the YAML file to /etc/tarantool/conf.d/ (the default systemd path) or to a location set in the TARANTOOL_CFG environment variable.

If you start instances with tarantool init.lua, you need to pass other configuration options as command-line parameters and environment variables, for example:

$ tarantool init.lua --alias router --memtx-memory 100 --workdir "~/db/3301" --advertise_uri "localhost:3301" --http_port "8080"
Internal representation of clusterwide configuration

In the file system, clusterwide configuration is represented by a file tree. Inside workdir of any configured instance you can find the following directory:

├── auth.yml
├── topology.yml
└── vshard_groups.yml

This is the clusterwide configuration with three default config sectionsauth, topology, and vshard_groups.

Due to historical reasons clusterwide configuration has two appearances:

  • old-style single-file config.yml with all sections combined, and
  • modern multi-file representation mentioned above.

Before cartridge v2.0 it used to look as follows, and this representation is still used in HTTP API and luatest helpers.

# config.yml
auth: {...}
topology: {...}
vshard_groups: {...}

Beyond these essential sections, clusterwide configuration may be used for storing some other role-specific data. Clusterwide configuration supports YAML as well as plain text sections. It can also be organized in nested subdirectories.

In Lua it’s represented by the ClusterwideConfig object (a table with metamethods). Refer to the cartridge.clusterwide-config module documentation for more details.

Two-phase commit

Cartridge manages clusterwide configuration to be identical everywhere using the two-phase commit algorithm implemented in the cartridge.twophase module. Changes in clusterwide configuration imply applying it on every instance in the cluster.

Almost every change in cluster parameters triggers a two-phase commit: joining/expelling a server, editing replica set roles, managing users, setting failover and vshard configuration.

Two-phase commit requires all instances to be alive and healthy, otherwise it returns an error.

For more details, please, refer to the cartridge.config_patch_clusterwide API reference.

Managing role-specific data

Beside system sections, clusterwide configuration may be used for storing some other role-specific data. It supports YAML as well as plain text sections. And it can also be organized in nested subdirectories.

Role-specific sections are used by some third-party roles, i.e. sharded-queue and cartridge-extensions.

A user can influence clusterwide configuration in various ways. You can alter configuration using Lua, HTTP or GraphQL API. Also there are luatest helpers available.


It works with old-style single-file representation only. It’s useful when there are only few sections needed.


cat > config.yml << CONFIG
custom_section: {}

Upload new config:

curl -v "localhost:8081/admin/config" -X PUT --data-binary @config.yml

Download it:

curl -v "localhost:8081/admin/config" -o config.yml

It’s suitable for role-specific sections only. System sections (topology, auth, vshard_groups, users_acl) can be neither uploaded nor downloaded.

If authorization is enabled, use the curl option --user username:password.


GraphQL API, by contrast, is only suitable for managing plain-text sections in the modern multi-file appearance. It is mostly used by WebUI, but sometimes it’s also helpful in tests:

g.cluster.main_server:graphql({query = [[
    mutation($sections: [ConfigSectionInput!]) {
        cluster {
            config(sections: $sections) {
    variables = {sections = {
        filename = 'custom_section.yml',
        content = '---\n{}\n...',

Unlike HTTP API, GraphQL affects only the sections mentioned in the query. All the other sections remain unchanged.

Similarly to HTTP API, GraphQL cluster {config} query isn’t suitable for managing system sections.


It’s not the most convenient way to configure third-party role, but it may be useful for role development. Please, refer to the corresponding API reference:

  • cartridge.config_patch_clusterwide
  • cartridge.config_get_deepcopy
  • cartridge.config_get_readonly

Example (from sharded-queue, simplified):

function create_tube(tube_name, tube_opts)
    local tubes = cartridge.config_get_deepcopy('tubes') or {}
    tubes[tube_name] = tube_opts or {}

    return cartridge.config_patch_clusterwide({tubes = tubes})

local function validate_config(conf)
    local tubes = conf.tubes or {}
    for tube_name, tube_opts in pairs(tubes) do
        -- validate tube_opts
    return true

local function apply_config(conf, opts)
    if opts.is_master then
        local tubes = cfg.tubes or {}
        -- create tubes according to the configuration
    return true
Luatest helpers

Cartridge test helpers provide methods for configuration management:

  • cartridge.test-helpers.cluster:upload_config,
  • cartridge.test-helpers.cluster:download_config.

Internally they wrap the HTTP API.


    g.cluster = helpers.Cluster.new(...)
    g.cluster:upload_config({some_section = 'some_value'})
        {some_section = 'some_value'}

Deploying an application

After you’ve developed your application locally, you can deploy it to a test or production environment.

“Deploy” includes packing the application into a specific distribution format, installing to the target system, and running the application.

You have four options to deploy a Tarantool Cartridge application:

  • as an rpm package (for production);
  • as a deb package (for production);
  • as a tar+gz archive (for testing, or as a workaround for production if root access is unavailable).
  • from sources (for local testing only).
Deploying as an rpm or deb package

The choice between DEB and RPM depends on the package manager of the target OS. For example, DEB is native for Debian Linux, and RPM – for CentOS.

  1. Pack the application into a distributable:

    $ cartridge pack rpm APP_NAME
    # -- OR --
    $ cartridge pack deb APP_NAME

    This will create an RPM package (e.g. ./my_app-0.1.0-1.rpm) or a DEB package (e.g. ./my_app-0.1.0-1.deb).

  2. Upload the package to target servers, with systemctl supported.

  3. Install:

    $ yum install APP_NAME-VERSION.rpm
    # -- OR --
    $ dpkg -i APP_NAME-VERSION.deb
  4. Configure the instance(s). Create a file called /etc/tarantool/conf.d/instances.yml. For example:

      cluster_cookie: secret-cookie
      http_port: 8081
      advertise_uri: localhost:3301
      http_port: 8082
      advertise_uri: localhost:3302

    See details here.

  5. Start Tarantool instances with the corresponding services. You can do it using systemctl, for example:

    # starts a single instance
    $ systemctl start my_app
    # starts multiple instances
    $ systemctl start my_app@router
    $ systemctl start my_app@storage_A
    $ systemctl start my_app@storage_B
  6. In case it is a cluster-aware application, proceed to deploying the cluster.


    If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:

    1. In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
    2. In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.
Deploying as a tar+gz archive
  1. Pack the application into a distributable:

    $ cartridge pack tgz APP_NAME

    This will create a tar+gz archive (e.g. ./my_app-0.1.0-1.tgz).

  2. Upload the archive to target servers, with tarantool and (optionally) cartridge-cli installed.

  3. Extract the archive:

    $ tar -xzvf APP_NAME-VERSION.tgz
  4. Configure the instance(s). Create a file called /etc/tarantool/conf.d/instances.yml. For example:

      cluster_cookie: secret-cookie
      http_port: 8081
      advertise_uri: localhost:3301
      http_port: 8082
      advertise_uri: localhost:3302

    See details here.

  5. Start Tarantool instance(s). You can do it using:

    • tarantool, for example:

      $ tarantool init.lua # starts a single instance
    • or cartridge, for example:

      # in application directory
      $ cartridge start # starts all instances
      $ cartridge start .router_1 # starts a single instance
      # in multi-application environment
      $ cartridge start my_app # starts all instances of my_app
      $ cartridge start my_app.router # starts a single instance
  6. In case it is a cluster-aware application, proceed to deploying the cluster.


    If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:

    1. In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
    2. In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.
Deploying from sources

This deployment method is intended for local testing only.

  1. Pull all dependencies to the .rocks directory:

    $ tarantoolctl rocks make

  2. Configure the instance(s). Create a file called /etc/tarantool/conf.d/instances.yml. For example:

      cluster_cookie: secret-cookie
      http_port: 8081
      advertise_uri: localhost:3301
      http_port: 8082
      advertise_uri: localhost:3302

    See details here.

  3. Start Tarantool instance(s). You can do it using:

    • tarantool, for example:

      $ tarantool init.lua # starts a single instance
    • or cartridge, for example:

      # in application directory
      cartridge start # starts all instances
      cartridge start .router_1 # starts a single instance
      # in multi-application environment
      cartridge start my_app # starts all instances of my_app
      cartridge start my_app.router # starts a single instance
  4. In case it is a cluster-aware application, proceed to deploying the cluster.


    If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:

    1. In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
    2. In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.

Starting/stopping instances

Depending on your deployment method, you can start/stop the instances using tarantool, cartridge CLI, or systemctl.

Start/stop using tarantool

With tarantool, you can start only a single instance:

$ tarantool init.lua # the simplest command

You can also specify more options on the command line or in environment variables.

To stop the instance, use Ctrl+C.

Start/stop using cartridge CLI

With cartridge CLI, you can start one or multiple instances:

$ cartridge start [APP_NAME[.INSTANCE_NAME]] [options]

The options are:

--script FILE

Application’s entry point. Defaults to:

  • ./init.lua when running from the app’s directory, or
  • :apps_path/:app_name/init.lua in a multi-app environment.
--apps_path PATH
Path to apps directory when running in a multi-app environment. Defaults to /usr/share/tarantool.
--run_dir DIR
Directory with pid and sock files. Defaults to TARANTOOL_RUN_DIR or /var/run/tarantool.
--cfg FILE
Cartridge instances YAML configuration file. Defaults to TARANTOOL_CFG or ./instances.yml. The instances.yml file contains cartridge.cfg() parameters described in the configuration section of this guide.
Do not daemonize.

For example:

cartridge start my_app --cfg demo.yml --run_dir ./tmp/run --foreground

It starts all tarantool instances specified in cfg file, in foreground, with enforced environment variables.

When APP_NAME is not provided, cartridge parses it from ./*.rockspec filename.

When INSTANCE_NAME is not provided, cartridge reads cfg file and starts all defined instances:

# in application directory
cartridge start # starts all instances
cartridge start .router_1 # start single instance

# in multi-application environment
cartridge start my_app # starts all instances of my_app
cartridge start my_app.router # start a single instance

To stop the instances, say:

$ cartridge stop [APP_NAME[.INSTANCE_NAME]] [options]

These options from the cartridge start command are supported:

  • --run_dir DIR
  • --cfg FILE
Start/stop using systemctl
  • To run a single instance:

    $ systemctl start APP_NAME

    This will start a systemd service that will listen to the port specified in instance configuration (http_port parameter).

  • To run multiple instances on one or multiple servers:

    $ systemctl start APP_NAME@INSTANCE_1
    $ systemctl start APP_NAME@INSTANCE_2
    $ systemctl start APP_NAME@INSTANCE_N

    where APP_NAME@INSTANCE_N is the instantiated service name for systemd with an incremental N – a number, unique for every instance, added to the port the instance will listen to (e.g., 3301, 3302, etc.)

  • To stop all services on a server, use the systemctl stop command and specify instance names one by one. For example:


When running instances with systemctl, keep these practices in mind:

  • You can specify instance configuration in a YAML file.

    This file can contain these options; see an example here).

    Save this file to /etc/tarantool/conf.d/ (the default systemd path) or to a location set in the TARANTOOL_CFG environment variable (if you’ve edited the application’s systemd unit file). The file name doesn’t matter: it can be instances.yml or anything else you like.

    Here’s what systemd is doing further:

    • obtains app_name (and instance_name, if specified) from the name of the application’s systemd unit file (e.g. APP_NAME@default or APP_NAME@INSTANCE_1);
    • sets default console socket (e.g. /var/run/tarantool/APP_NAME@INSTANCE_1.control), PID file (e.g. /var/run/tarantool/APP_NAME@INSTANCE_1.pid) and workdir (e.g. /var/lib/tarantool/<APP_NAME>.<INSTANCE_NAME>). Environment=TARANTOOL_WORKDIR=${workdir}.%i

    Finally, cartridge looks across all YAML files in /etc/tarantool/conf.d for a section with the appropriate name (e.g. app_name that contains common configuration for all instances, and app_name.instance_1 that contain instance-specific configuration). As a result, Cartridge options workdir, console_sock, and pid_file in the YAML file cartridge.cfg become useless, because systemd overrides them.

  • The default tool for querying logs is journalctl. For example:

    # show log messages for a systemd unit named APP_NAME.INSTANCE_1
    $ journalctl -u APP_NAME.INSTANCE_1
    # show only the most recent messages and continuously print new ones
    $ journalctl -f -u APP_NAME.INSTANCE_1

    If really needed, you can change logging-related box.cfg options in the YAML configuration file: see log and other related options.

Error handling guidelines

Almost all errors in Cartridge follow the return nil, err style, where err is an error object produced by Tarantool’s errors module. Cartridge doesn’t raise errors except for bugs and functions contracts mismatch. Developing new roles should follow these guidelines as well.

Error objects in Lua

Error classes help to locate the problem’s source. For this purpose, an error object contains its class, stack traceback, and a message.

local errors = require('errors')
local DangerousError = errors.new_class("DangerousError")

local function some_fancy_function()

    local something_bad_happens = true

    if something_bad_happens then
        return nil, DangerousError:new("Oh boy")

    return "success" -- not reachable due to the error

nil DangerousError: Oh boy
stack traceback:
    test.lua:9: in function 'some_fancy_function'
    test.lua:15: in main chunk

For uniform error handling, errors provides the :pcall API:

local ret, err = DangerousError:pcall(some_fancy_function)
print(ret, err)
nil DangerousError: Oh boy
stack traceback:
    test.lua:9: in function <test.lua:4>
    [C]: in function 'xpcall'
    .rocks/share/tarantool/errors.lua:139: in function 'pcall'
    test.lua:15: in main chunk

`lua print(DangerousError:pcall(error, 'what could possibly go wrong?')) `

nil DangerousError: what could possibly go wrong?
stack traceback:
    [C]: in function 'xpcall'
    .rocks/share/tarantool/errors.lua:139: in function 'pcall'
    test.lua:15: in main chunk

For errors.pcall there is no difference between the return nil, err and error() approaches.

Note that errors.pcall API differs from the vanilla Lua pcall. Instead of true the former returns values returned from the call. If there is an error, it returns nil instead of false, plus an error message.

Remote net.box calls keep no stack trace from the remote. In that case, errors.netbox_eval comes to the rescue. It will find a stack trace from local and remote hosts and restore metatables.

> conn = require('net.box').connect('localhost:3301')
> print( errors.netbox_eval(conn, 'return nil, DoSomethingError:new("oops")') )
nil     DoSomethingError: oops
stack traceback:
        eval:1: in main chunk
during net.box eval on localhost:3301
stack traceback:
        [string "return print( errors.netbox_eval("]:1: in main chunk
        [C]: in function 'pcall'

However, vshard implemented in Tarantool doesn’t utilize the errors module. Instead it uses its own errors. Keep this in mind when working with vshard functions.

Data included in an error object (class name, message, traceback) may be easily converted to string using the tostring() function.


GraphQL implementation in Cartridge wraps the errors module, so a typical error response looks as follows:

        "message":"what could possibly go wrong?",
            "io.tarantool.errors.stack":"stack traceback: ...",

Read more about errors in the GraphQL specification.

If you’re going to implement a GraphQL handler, you can add your own extension like this:

local err = DangerousError:new('I have extension')
err.graphql_extensions = {code = 403}

It will lead to the following response:

        "message":"I have extension",
            "io.tarantool.errors.stack":"stack traceback: ...",

In a nutshell, an errors object is a table. This means that it can be swiftly represented in JSON. This approach is used by Cartridge to handle errors via http:

local err = DangerousError:new('Who would have thought?')

local resp = req:render({
    status = 500,
    headers = {
        ['content-type'] = "application/json; charset=utf-8"
    json = json.encode(err),
    "err":"Who would have thought?",
    "stack":"stack traceback:..."

Administrator’s guide

This guide explains how to deploy and manage a Tarantool cluster with Tarantool Cartridge.


For more information on managing Tarantool instances, see the server administration section of the Tarantool manual.

Before deploying the cluster, familiarize yourself with the notion of cluster roles and deploy Tarantool instances according to the desired cluster topology.

Deploying the cluster

To deploy the cluster, first, configure your Tarantool instances according to the desired cluster topology, for example:

my_app.router: {"advertise_uri": "localhost:3301", "http_port": 8080, "workdir": "./tmp/router"}
my_app.storage_A_master: {"advertise_uri": "localhost:3302", "http_enabled": False, "workdir": "./tmp/storage-a-master"}
my_app.storage_A_replica: {"advertise_uri": "localhost:3303", "http_enabled": False, "workdir": "./tmp/storage-a-replica"}
my_app.storage_B_master: {"advertise_uri": "localhost:3304", "http_enabled": False, "workdir": "./tmp/storage-b-master"}
my_app.storage_B_replica: {"advertise_uri": "localhost:3305", "http_enabled": False, "workdir": "./tmp/storage-b-replica"}

Then start the instances, for example using cartridge CLI:

cartridge start my_app --cfg demo.yml --run_dir ./tmp/run --foreground

And bootstrap the cluster. You can do this via the Web interface which is available at http://<instance_hostname>:<instance_http_port> (in this example, http://localhost:8080).

In the web interface, do the following:

  1. Depending on the authentication state:

    • If enabled (in production), enter your credentials and click Login:



    • If disabled (for easier testing), simply proceed to configuring the cluster.

  2. Click Сonfigure next to the first unconfigured server to create the first replica set – solely for the router (intended for compute-intensive workloads).



    In the pop-up window, check the vshard-router role – or any custom role that has vshard-router as a dependent role (in this example, this is a custom role named app.roles.api).

    (Optional) Specify a display name for the replica set, for example router.




    As described in the built-in roles section, it is a good practice to enable workload-specific cluster roles on instances running on physical servers with workload-specific hardware.

    Click Create replica set and see the newly-created replica set in the web interface:




    Be careful: after an instance joins a replica set, you CAN NOT revert this or make the instance join any other replica set.

  3. Create another replica set – for a master storage node (intended for transaction-intensive workloads).

    Check the vshard-storage role – or any custom role that has vshard-storage as a dependent role (in this example, this is a custom role named app.roles.storage).

    (Optional) Check a specific group, for example hot. Replica sets with vshard-storage roles can belong to different groups. In our example, these are hot or cold groups meant to process hot and cold data independently. These groups are specified in the cluster’s configuration file; by default, a cluster has no groups.

    (Optional) Specify a display name for the replica set, for example hot-storage.

    Click Create replica set.



  4. (Optional) If required by topology, populate the second replica set with more storage nodes:

    1. Click Configure next to another unconfigured server dedicated for transaction-intensive workloads.

    2. Click Join Replica Set tab.

    3. Select the second replica set, and click Join replica set to add the server to it.



  5. Depending on cluster topology:

    • add more instances to the first or second replica sets, or
    • create more replica sets and populate them with instances meant to handle a specific type of workload (compute or transactions).

    For example:



  6. (Optional) By default, all new vshard-storage replica sets get a weight of 1 before the vshard bootstrap in the next step.


    In case you add a new replica set after vshard bootstrap, as described in the topology change section, it will get a weight of 0 by default.

    To make different replica sets store different numbers of buckets, click Edit next to a replica set, change its default weight, and click Save:



    For more information on buckets and replica set’s weights, see the vshard module documentation.

  7. Bootstrap vshard by clicking the corresponding button, or by saying cartridge.admin.boostrap_vshard() over the administrative console.

    This command creates virtual buckets and distributes them among storages.

    From now on, all cluster configuration can be done via the web interface.

Updating the configuration

Cluster configuration is specified in a YAML configuration file. This file includes cluster topology and role descriptions.

All instances in Tarantool cluster have the same configuration. To this end, every instance stores a copy of the configuration file, and the cluster keeps these copies in sync: as you submit updated configuration in the Web interface, the cluster validates it (and rejects inappropriate changes) and distributes automatically across the cluster.

To update the configuration:

  1. Click Configuration files tab.

  2. (Optional) Click Downloaded to get hold of the current configuration file.

  3. Update the configuration file.

    You can add/change/remove any sections except system ones: topology, vshard, and vshard_groups.

    To remove a section, simply remove it from the configuration file.

  4. Compress the configuration file as a .zip archive and click Upload configuration button to upload it.

    You will see a message in the lower part of the screen saying whether configuration was uploaded successfully, and an error description if the new configuration was not applied.

Managing the cluster

This chapter explains how to:

  • change the cluster topology,
  • enable automatic failover,
  • switch the replica set’s master manually,
  • deactivate replica sets, and
  • expel instances.
Changing the cluster topology

Upon adding a newly deployed instance to a new or existing replica set:

  1. The cluster validates the configuration update by checking if the new instance is available using the membership module.


    The membership module works over the UDP protocol and can operate before the box.cfg function is called.

    All the nodes in the cluster must be healthy for validation success.

  2. The new instance waits until another instance in the cluster receives the configuration update and discovers it, again, using the membership module. On this step, the new instance does not have a UUID yet.

  3. Once the instance realizes its presence is known to the cluster, it calls the box.cfg function and starts living its life.

An optimal strategy for connecting new nodes to the cluster is to deploy a new zero-weight replica set instance by instance, and then increase the weight. Once the weight is updated and all cluster nodes are notified of the configuration change, buckets start migrating to new nodes.

To populate the cluster with more nodes, do the following:

  1. Deploy new Tarantool instances as described in the deployment section.

    If new nodes do not appear in the Web interface, click Probe server and specify their URIs manually.



    If a node is accessible, it will appear in the list.

  2. In the Web interface:

    • Create a new replica set with one of the new instances: click Configure next to an unconfigured server, check the necessary roles, and click Create replica set:


      In case you are adding a new vshard-storage instance, remember that all such instances get a 0 weight by default after the vshard bootstrap which happened during the initial cluster deployment.



    • Or add the instances to existing replica sets: click Configure next to an unconfigured server, click Join replica set tab, select a replica set, and click Join replica set.

    If necessary, repeat this for more instances to reach the desired redundancy level.

  3. In case you are deploying a new vshard-storage replica set, populate it with data when you are ready: click Edit next to the replica set in question, increase its weight, and click Save to start data rebalancing.

As an alternative to the web interface, you can view and change cluster topology via GraphQL. The cluster’s endpoint for serving GraphQL queries is /admin/api. You can use any third-party GraphQL client like GraphiQL or Altair.


  • listing all servers in the cluster:

    query {
        servers { alias uri uuid }
  • listing all replica sets with their servers:

    query {
        replicasets {
            servers { uri uuid }
  • joining a server to a new replica set with a storage role enabled:

    mutation {
            uri: "localhost:33003"
            roles: ["vshard-storage"]
Data rebalancing

Rebalancing (resharding) is initiated periodically and upon adding a new replica set with a non-zero weight to the cluster. For more information, see the rebalancing process section of the vshard module documentation.

The most convenient way to trace through the process of rebalancing is to monitor the number of active buckets on storage nodes. Initially, a newly added replica set has 0 active buckets. After a few minutes, the background rebalancing process begins to transfer buckets from other replica sets to the new one. Rebalancing continues until the data is distributed evenly among all replica sets.

To monitor the current number of buckets, connect to any Tarantool instance over the administrative console, and say:

tarantool> vshard.storage.info().bucket
- receiving: 0
  active: 1000
  total: 1000
  garbage: 0
  sending: 0

The number of buckets may be increasing or decreasing depending on whether the rebalancer is migrating buckets to or from the storage node.

For more information on the monitoring parameters, see the monitoring storages section.

Deactivating replica sets

To deactivate an entire replica set (e.g., to perform maintenance on it) means to move all of its buckets to other sets.

To deactivate a set, do the following:

  1. Click Edit next to the set in question.

  2. Set its weight to 0 and click Save:



  3. Wait for the rebalancing process to finish migrating all the set’s buckets away. You can monitor the current bucket number as described in the data rebalancing section.

Expelling instances

Once an instance is expelled, it can never participate in the cluster again as every instance will reject it.

To expel an instance, click next to it, then click Expel server and Expel:




There are two restrictions:

  • You can’t expel a leader if it has a replica. Switch leadership first.
  • You can’t expel a vshard-storage if it has buckets. Set the weight to zero and wait until rebalancing is completed.
Enabling automatic failover

In a master-replica cluster configuration with automatic failover enabled, if the user-specified master of any replica set fails, the cluster automatically chooses the next replica from the priority list and grants it the active master role (read/write). When the failed master comes back online, its role is restored and the active master, again, becomes a replica (read-only). This works for any roles.

To set the priority in a replica set:

  1. Click Edit next to the replica set in question.

  2. Scroll to the bottom of the Edit replica set box to see the list of servers.

  3. Drag replicas to their place in the priority list, and click Save:



The failover is disabled by default. To enable it:

  1. Click Failover:



  2. In the Failover control box, click Enable:



The failover status will change to enabled:



For more information, see the replication section of the Tarantool manual.

Switching the replica set’s master

To manually switch the master in a replica set:

  1. Click the Edit button next to the replica set in question:



  2. Scroll to the bottom of the Edit replica set box to see the list of servers. The server on the top is the master.



  3. Drag a required server to the top position and click Save.

The new master will automatically enter the read/write mode, while the ex-master will become read-only. This works for any roles.

Managing users

On the Users tab, you can enable/disable authentication as well as add, remove, edit, and view existing users who can access the web interface.



Notice that the Users tab is available only if authorization in the web interface is implemented.

Also, some features (like deleting users) can be disabled in the cluster configuration; this is regulated by the auth_backend_name option passed to cartridge.cfg().

Resolving conflicts

Tarantool has an embedded mechanism for asynchronous replication. As a consequence, records are distributed among the replicas with a delay, so conflicts can arise.

To prevent conflicts, the special trigger space.before_replace is used. It is executed every time before making changes to the table for which it was configured. The trigger function is implemented in the Lua programming language. This function takes the original and new values of the tuple to be modified as its arguments. The returned value of the function is used to change the result of the operation: this will be the new value of the modified tuple.

For insert operations, the old value is absent, so nil is passed as the first argument.

For delete operations, the new value is absent, so nil is passed as the second argument. The trigger function can also return nil, thus turning this operation into delete.

This example shows how to use the space.before_replace trigger to prevent replication conflicts. Suppose we have a box.space.test table that is modified in multiple replicas at the same time. We store one payload field in this table. To ensure consistency, we also store the last modification time in each tuple of this table and set the space.before_replace trigger, which gives preference to newer tuples. Below is the code in Lua:

fiber = require('fiber')
-- define a function that will modify the function test_replace(tuple)
        -- add a timestamp to each tuple in the space
        tuple = box.tuple.new(tuple):update{{'!', 2, fiber.time()}}
box.cfg{ } -- restore from the local directory
-- set the trigger to avoid conflicts
box.space.test:before_replace(function(old, new)
        if old ~= nil and new ~= nil and new[2] < old[2] then
                return old -- ignore the request
        -- otherwise apply as is
box.cfg{ replication = {...} } -- subscribe

Monitoring a cluster via CLI

This section describes parameters you can monitor over the administrative console.

Connecting to nodes via CLI

Each Tarantool node (router/storage) provides an administrative console (Command Line Interface) for debugging, monitoring, and troubleshooting. The console acts as a Lua interpreter and displays the result in the human-readable YAML format. To connect to a Tarantool instance via the console, say:

$ tarantoolctl connect <instance_hostname>:<port>

where the <instance_hostname>:<port> is the instance’s URI.

Monitoring storages

Use vshard.storage.info() to obtain information on storage nodes.

Output example
tarantool> vshard.storage.info()
- replicasets:
    uuid: <replicaset_2>
        uri: storage:storage@
    uuid: <replicaset_1>
        uri: storage:storage@
  bucket: <!-- buckets status
    receiving: 0 <!-- buckets in the RECEIVING state
    active: 2 <!-- buckets in the ACTIVE state
    garbage: 0 <!-- buckets in the GARBAGE state (are to be deleted)
    total: 2 <!-- total number of buckets
    sending: 0 <!-- buckets in the SENDING state
  status: 1 <!-- the status of the replica set
    status: disconnected <!-- the status of the replication
    idle: <idle>
  - ['MASTER_IS_UNREACHABLE', 'Master is unreachable: disconnected']
Status list
Code Critical level Description
0 Green A replica set works in a regular way.
1 Yellow There are some issues, but they don’t affect a replica set efficiency (worth noticing, but don’t require immediate intervention).
2 Orange A replica set is in a degraded state.
3 Red A replica set is disabled.
Potential issues
  • MISSING_MASTER — No master node in the replica set configuration.

    Critical level: Orange.

    Cluster condition: Service is degraded for data-change requests to the replica set.

    Solution: Set the master node for the replica set in the configuration using API.

  • UNREACHABLE_MASTER — No connection between the master and the replica.

    Critical level:

    • If idle value doesn’t exceed T1 threshold (1 s.) — Yellow,
    • If idle value doesn’t exceed T2 threshold (5 s.) — Orange,
    • If idle value exceeds T3 threshold (10 s.) — Red.

    Cluster condition: For read requests to replica, the data may be obsolete compared with the data on master.

    Solution: Reconnect to the master: fix the network issues, reset the current master, switch to another master.

  • LOW_REDUNDANCY — Master has access to a single replica only.

    Critical level: Yellow.

    Cluster condition: The data storage redundancy factor is equal to 2. It is lower than the minimal recommended value for production usage.

    Solution: Check cluster configuration:

    • If only one master and one replica are specified in the configuration, it is recommended to add at least one more replica to reach the redundancy factor of 3.
    • If three or more replicas are specified in the configuration, consider checking the replicas’ states and network connection among the replicas.
  • INVALID_REBALANCING — Rebalancing invariant was violated. During migration, a storage node can either send or receive buckets. So it shouldn’t be the case that a replica set sends buckets to one replica set and receives buckets from another replica set at the same time.

    Critical level: Yellow.

    Cluster condition: Rebalancing is on hold.

    Solution: There are two possible reasons for invariant violation:

    • The rebalancer has crashed.
    • Bucket states were changed manually.

    Either way, please contact Tarantool support.

  • HIGH_REPLICATION_LAG — Replica’s lag exceeds T1 threshold (1 sec.).

    Critical level:

    • If the lag doesn’t exceed T1 threshold (1 sec.) — Yellow;
    • If the lag exceeds T2 threshold (5 sec.) — Orange.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the Tarantool troubleshooting guide.

  • OUT_OF_SYNC — Mal-synchronization occured. The lag exceeds T3 threshold (10 sec.).

    Critical level: Red.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the Tarantool troubleshooting guide.

  • UNREACHABLE_REPLICA — One or multiple replicas are unreachable.

    Critical level: Yellow.

    Cluster condition: Data storage redundancy factor for the given replica set is less than the configured factor. If the replica is next in the queue for rebalancing (in accordance with the weight configuration), the requests are forwarded to the replica that is still next in the queue.

    Solution: Check the error message and find out which replica is unreachable. If a replica is disabled, enable it. If this doesn’t help, consider checking the network.

  • UNREACHABLE_REPLICASET — All replicas except for the current one are unreachable. Critical level: Red.

    Cluster condition: The replica stores obsolete data.

    Solution: Check if the other replicas are enabled. If all replicas are enabled, consider checking network issues on the master. If the replicas are disabled, check them first: the master might be working properly.

Monitoring routers

Use vshard.router.info() to obtain information on the router.

Output example
tarantool> vshard.router.info()
- replicasets:
    <replica set UUID>:
        status: <available / unreachable / missing>
        uri: <!-- URI of master
        uuid: <!-- UUID of instance
        status: <available / unreachable / missing>
        uri: <!-- URI of replica used for slave requests
        uuid: <!-- UUID of instance
      uuid: <!-- UUID of replica set
    <replica set UUID>: ...
  status: <!-- status of router
    known: <!-- number of buckets with the known destination
    unknown: <!-- number of other buckets
  alerts: [<alert code>, <alert description>], ...
Status list
Code Critical level Description
0 Green The router works in a regular way.
1 Yellow Some replicas sre unreachable (affects the speed of executing read requests).
2 Orange Service is degraded for changing data.
3 Red Service is degraded for reading data.
Potential issues


Depending on the nature of the issue, use either the UUID of a replica, or the UUID of a replica set.

  • MISSING_MASTER — The master in one or multiple replica sets is not specified in the configuration.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Specify the master in the configuration.

  • UNREACHABLE_MASTER — The router lost connection with the master of one or multiple replica sets.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Restore connection with the master. First, check if the master is enabled. If it is, consider checking the network.

  • SUBOPTIMAL_REPLICA — There is a replica for read-only requests, but this replica is not optimal according to the configured weights. This means that the optimal replica is unreachable.

    Critical level: Yellow.

    Cluster condition: Read-only requests are forwarded to a backup replica.

    Solution: Check the status of the optimal replica and its network connection.

  • UNREACHABLE_REPLICASET — A replica set is unreachable for both read-only and data-change requests.

    Critical Level: Red.

    Cluster condition: Partial degrade for read-only and data-change requests.

    Solution: The replica set has an unreachable master and replica. Check the error message to detect this replica set. Then fix the issue in the same way as for UNREACHABLE_REPLICA.

Upgrading schema

When upgrading Tarantool to a newer version, please don’t forget to:

  1. Stop the cluster
  2. Make sure that upgrade_schema option is enabled
  3. Start the cluster again

This will automatically apply box.schema.upgrade() on the leader, according to the failover priority in the topology configuration.

Disaster recovery

Please see the disaster recovery section in the Tarantool manual.


Please see the backups section in the Tarantool manual.


First of all, see a similar guide in the Tarantool manual. Below you can find other Cartridge-specific problems considered.

Editing clusterwide configuration in WebUI returns an error


  • NetboxConnectError: "localhost:3302": Connection refused;
  • Prepare2pcError: Instance state is OperationError, can't apply config in this state.

The root problem: all cluster instances are equal, and all of them store a copy of clusterwide configuration, which must be the same. If an instance degrades (can’t accept new configuration) – the quorum is lost. This prevents further configuration modifications to avoid inconsistency.

But sometimes inconsistency is needed to repair the system, at least partially and temporarily. It can be achieved by disabling degraded instances.


  1. Connect to the console of the alive instance.

    tarantoolctl connect unix/:/var/run/tarantool/<app-name>.<instance-name>.control
  2. Inspect what’s going on.

    cartridge = require('cartridge')
    report = {}
    for _, srv in pairs(cartridge.admin_get_servers()) do
        report[srv.uuid] = {uri = srv.uri, status = srv.status, message = srv.message}
    return report
  3. If you’re ready to proceed, run the following snippet. It’ll disable all instances which are not healthy. After that, you can use the WebUI as usual.

    disable_list = {}
    for uuid, srv in pairs(report) do
        if srv.status ~= 'healthy' then
           table.insert(disable_list, uuid)
    return cartridge.admin_disable_servers(disable_list)
  4. When it’s necessary to bring disabled instances back, re-enable them in a similar manner:

    cartridge = require('cartridge')
    enable_list = {}
    for _, srv in pairs(cartridge.admin_get_servers()) do
        if srv.disabled then
           table.insert(enable_list, srv.uuid)
    return cartridge.admin_enable_servers(enable_list)

An instance is stuck in the ConnectingFullmesh state upon restart



The root problem: after restart, the instance tries to connect to all its replicas and remains in the ConnectingFullmesh state until it succeeds. If it can’t (due to replica URI unavailability or for any other reason) – it’s stuck forever.


Set the replication_connect_quorum option to zero. It may be accomplished in two ways:

  • By restarting it with the corresponding option set (in environment variables or in the instance configuration file);

  • Or without restart – by running the following one-liner:

    echo "box.cfg({replication_connect_quorum = 0})" | tarantoolctl connect \

I want to run an instance with a new advertise_uri

The root problem: advertise_uri parameter is persisted in the clusterwide configuration. Even if it changes upon restart, the rest of the cluster keeps using the old one, and the cluster may behave in an odd way.


The clusterwide configuration should be updated.

  1. Make sure all instances are running and not stuck in the ConnectingFullmesh state (see above).

  2. Make sure all instances have discovered each other (i.e. they look healthy in the WebUI).

  3. Run the following snippet in the Tarantool console. It’ll prepare a patch for the clusterwide configuration.

    cartridge = require('cartridge')
    members = require('membership').members()
    edit_list = {}
    changelog = {}
    for _, srv in pairs(cartridge.admin_get_servers()) do
        for _, m in pairs(members) do
            if m.status == 'alive'
            and m.payload.uuid == srv.uuid
            and m.uri ~= srv.uri
                table.insert(edit_list, {uuid = srv.uuid, uri = m.uri})
                table.insert(changelog, string.format('%s -> %s (%s)', srv.uri, m.uri, m.payload.alias))
    return changelog

    As a result you’ll see a brief summary like the following one:

    localhost:3301> return changelog
    - - localhost:13301 -> localhost:3301 (srv-1)
      - localhost:13302 -> localhost:3302 (srv-2)
      - localhost:13303 -> localhost:3303 (srv-3)
      - localhost:13304 -> localhost:3304 (srv-4)
      - localhost:13305 -> localhost:3305 (srv-5)
  4. Finally, apply the patch:

    cartridge.admin_edit_topology({servers = edit_list})

The cluster is doomed, I’ve edited the config manually. How do I reload it?


Please be aware that it’s quite risky and you know what you’re doing. There’s some useful information about clusterwide configuration anatomy and “normal” management API.

But if you’re still determined to reload the configuration manually, you can do (in the Tarantool console):

function reload_clusterwide_config()
    local changelog = {}

    local ClusterwideConfig = require('cartridge.clusterwide-config')
    local confapplier = require('cartridge.confapplier')

    -- load config from filesystem
    table.insert(changelog, 'Loading new config...')

    local cfg, err = ClusterwideConfig.load('./config')
    if err ~= nil then
        return changelog, string.format('Failed to load new config: %s', err)

    -- check instance state
    table.insert(changelog, 'Checking instance config state...')

    local roles_configured_state = 'RolesConfigured'
    local connecting_fullmesh_state = 'ConnectingFullmesh'

    local state = confapplier.wish_state(roles_configured_state, 10)

    if state == connecting_fullmesh_state then
        return changelog, string.format(
            'Failed to reach %s config state. Stuck in %s. ' ..
                'Call "box.cfg({replication_connect_quorum = 0})" in instance console and try again',
            roles_configured_state, state

    if state ~= roles_configured_state then
        return changelog, string.format(
            'Failed to reach %s config state. Stuck in %s',
            roles_configured_state, state

    -- apply config changes
    table.insert(changelog, 'Applying config changes...')

    local ok, err = confapplier.apply_config(cfg)
    if err ~= nil then
        return changelog, string.format('Failed to apply new config: %s', err)

    table.insert(changelog, 'Cluster-wide configuration was successfully updated')

    return changelog


This snippet reloads the configuration on a single instance. All other instances continue operating as before.


If further configuration modifications are made with a two-phase commit (e.g. via the WebUI or with the Lua API), the active configuration of an active instance will be spread across the cluster.

Repairing cluster using Cartridge CLI repair command

Cartridge CLI has repair command since version 2.3.0.

It can be used to get current topology, remove instance from cluster, change repicaset leader or change instance advertise URI.


cartridge repair patches the cluster-wide configuration files of application instances placed ON THE LOCAL MACHINE. It means that running cartridge repair on all machines is user responsibility.


It’s not enough to apply new configuration: the configuration should be reloaded by the instance. If your application uses cartridge >= 2.0.0, you can simply use --reload flag to reload configuration. Otherwise, you need to restart instances or reload configuration manually.

Changing instance advertise URI

To change instance advertise URI you have to perform these actions:

  1. Start instance with a new advertise URI. The easiest way is to change advertise_uri value in the instance configuration file).

  2. Make sure instances are running and not stuck in the ConnectingFullmesh state (see above).

  3. Get instance UUID: * open server details tab in WebUI; * call cartridge repair list-topology --name <app-name> and find desired instance UUID: * get instance box.info().uuid:

    echo "return box.info().uuid" | tarantoolctl connect \
  4. Now we need to update instance advertise URI in all instances cluster-wide configuration files on each machine. Run cartridge repair set-advertise-uri with --dry-run flag on each machine to check cluster-wide config changes computed by cartridge-cli:

    cartridge repair set-advertise-uri \
      --name myapp \
      --dry-run \
      <instance-uuid> <new-advertise-uri>
  5. Run cartridge repair set-advertise-uri without --dry-run flag on each machine to apply config changes computed by cartridge-cli. If your application uses cartridge >= 2.0.0, you can specify --reload flag to load new cluter-wide configuration on instances. Otherwise, you need to restart instances or reload configuration manually.

    cartridge repair set-advertise-uri \
      --name myapp \
      --verbose \
      --reload \
      <instance-uuid> <new-advertise-uri>
Changing replicaset leader

You can change replicaset leader using cartridge repair command.

  1. Get replicaset UUID and new leader UUID (in WebUI or by calling cartridge repair list-topology --name <app-name>).

  2. Now we need to update cluster-wide config for all instances on each machine. Run cartridge repair set-leader with --dry-run flag on each machine to check cluster-wide config changes computed by `` cartridge-cli``:

    cartridge repair set-leader \
      --name myapp \
      --dry-run \
      <replicaset-uuid> <instance-uuid>
  3. Run cartridge repair set-advertise-uri without --dry-run flag on each machine to apply config changes computed by cartridge-cli. If your application uses cartridge >= 2.0.0, you can specify --reload flag to load new cluter-wide configuration on instances. Otherwise, you need to restart instances or reload configuration manually.

    cartridge repair set-leader \
      --name myapp \
      --verbose \
      --reload \
      <replicaset-uuid> <instance-uuid>
Removing instance from the cluster

You can remove instance from cluster using cartridge repair command.

  1. Get instance UUID: * open server details tab in WebUI; * call cartridge repair list-topology --name <app-name> and find desired instance UUID: * get instance box.info().uuid:

    echo "return box.info().uuid" | tarantoolctl connect \
  2. Now we need to update cluster-wide config for all instances on each machine. Run cartridge repair remove-instance with --dry-run flag on each machine to check cluster-wide config changes computed by cartridge-cli:

    cartridge repair remove-instance \
      --name myapp \
      --dry-run \
  3. Run cartridge repair remove-instance without --dry-run flag on each machine to apply config changes computed by cartridge-cli. If your application uses cartridge >= 2.0.0, you can specify --reload flag to load new cluter-wide configuration on instances. Otherwise, you need to restart instances or reload configuration manually.

    cartridge repair set-leader \
      --name myapp \
      --verbose \
      --reload \
      <replicaset-uuid> <instance-uuid>

Table of contents

Module cartridge

Tarantool framework for distributed applications development.

Cartridge provides you a simple way to manage distributed applications operations. The cluster consists of several Tarantool instances acting in concert. Cartridge does not care about how the instances start, it only cares about the configuration of already running processes.

Cartridge automates vshard and replication configuration, simplifies custom configuration and administrative tasks.

cfg (opts, box_opts)

Initialize the cartridge module.

After this call, you can operate the instance via Tarantool console. Notice that this call does not initialize the database - box.cfg is not called yet. Do not try to call box.cfg yourself: cartridge will do it when it is time.

Both cartridge.cfg and box.cfg options can be configured with command-line arguments or environment variables.


  • opts: Available options are:
    • workdir: (optional string) a directory where all data will be stored: snapshots, wal logs and cartridge config file.(default: “.”, overridden byenv TARANTOOL_WORKDIR ,args --workdir )
    • advertise_uri: (optional string) either “<HOST>:<PORT>” or “<HOST>:” or “<PORT>”.Used by other instances to connect to the current one.When <HOST> isn’t specified, it’s detected as the only non-local IP address.If there is more than one IP address available - defaults to “localhost”.When <PORT> isn’t specified, it’s derived as follows:If the TARANTOOL_INSTANCE_NAME has numeric suffix _<N>, then <PORT> = 3300+<N>.Otherwise default <PORT> = 3301 is used.
    • cluster_cookie: (optional string) secret used to separate unrelated applications, whichprevents them from seeing each other during broadcasts.Also used as admin password in HTTP and binary connections and forencrypting internal communications.Allowed symbols are [a-zA-Z0-9_.~-] .(default: “secret-cluster-cookie”, overridden byenv TARANTOOL_CLUSTER_COOKIE ,args --cluster-cookie )
    • swim_broadcast: (optional boolean) Announce own advertise_uri over UDP broadcast.Cartridge health-checks are governed by SWIM protocol. To simplifyinstances discovery on start it can UDP broadcast all networksknown from getifaddrs() C call. The broadcast is sent to severalports: default 3301, the <PORT> from the advertise_uri option,and its neighbours <PORT>+1 and <PORT>-1.(Added in v2.3.0-23,default: true, overridden byenv TARANTOOL_SWIM_BROADCAST,args –swim-broadcast)
    • bucket_count: (optional number) bucket count for vshard cluster. See vshard doc for more details.(default: 30000, overridden byenv TARANTOOL_BUCKET_COUNT ,args --bucket-count )
    • vshard_groups: (optional {[string]=VshardGroup,…}) vshard storage groups, table keys used as names
    • http_enabled: (optional boolean) whether http server should be started(default: true, overridden byenv TARANTOOL_HTTP_ENABLED,args –http-enabled)
    • webui_enabled: (optional boolean) whether WebUI and corresponding API (HTTP + GraphQL) should beinitialized. Ignored if http_enabled is false . Doesn’taffect auth_enabled .(Added in v2.4.0-38,default: true, overridden byenv TARANTOOL_WEBUI_ENABLED ,args --webui-enabled )
    • http_port: (string or number) port to open administrative UI and API on(default: 8081, derived from`TARANTOOL_INSTANCE_NAME`,overridden byenv TARANTOOL_HTTP_PORT,args –http-port)
    • http_host: (optional string) host to open administrative UI and API on(Added in v2.4.0-42default: “”, overridden byenv TARANTOOL_HTTP_HOST ,args --http-host )
    • alias: (optional string) human-readable instance name that will be available in administrative UI(default: argparse instance name, overridden byenv TARANTOOL_ALIAS,args –alias)
    • roles: (table) list of user-defined roles that will be availableto enable on the instance_uuid
    • auth_enabled: (optional boolean) toggle authentication in administrative UI and API(default: false)
    • auth_backend_name: (optional string) user-provided set of callbacks related to authentication
    • console_sock: (optional string) Socket to start console listening on.(default: nil, overridden byenv TARANTOOL_CONSOLE_SOCK ,args --console-sock )
    • webui_blacklist: (optional {string,…}) List of pages to be hidden in WebUI.(Added in v2.0.1-54, default: {} )
    • upgrade_schema: (optional boolean) Run schema upgrade on the leader instance.(Added in v2.0.2-3,default: false , overridden byenv TARANTOOL_UPGRADE_SCHEMA args --upgrade-schema )
    • roles_reload_allowed: (optional boolean) Allow calling cartridge.reload_roles.(Added in v2.3.0-73, default: false )
    • upload_prefix: (optional string) Temporary directory used for saving files during clusterwideconfig upload. If relative path is specified, it’s evaluatedrelative to the workdir .(Added in v2.4.0-43,default: /tmp , overridden byenv TARANTOOL_UPLOAD_PREFIX ,args --upload-prefix )
  • box_opts: (optional table) tarantool extra box.cfg options (e.g. memtx_memory),that may require additional tuning