Рейтинг@Mail.ru
Tarantool 2.5.1 documentation
Tarantool 2.5.1 documentation

Tarantool 2.5.1 documentation

Tarantool - Documentation

Overview

An application server together with a database manager

Tarantool is a Lua application server integrated with a database management system. It has a “fiber” model which means that many Tarantool applications can run simultaneously on a single thread, while each instance of the Tarantool server itself can run multiple threads for input-output and background maintenance. It incorporates the LuaJIT – “Just In Time” – Lua compiler, Lua libraries for most common applications, and the Tarantool Database Server which is an established NoSQL DBMS. Thus Tarantool serves all the purposes that have made node.js and Twisted popular, plus it supports data persistence.

The code is free. The open-source license is BSD license. The supported platforms are GNU/Linux, Mac OS and FreeBSD.

Tarantool’s creator and biggest user is Mail.Ru, the largest internet company in Russia, with 30 million users, 25 million emails per day, and a web site whose Alexa global rank is in the top 40 worldwide. Tarantool services Mail.Ru’s hottest data, such as the session data of online users, the properties of online applications, the caches of the underlying data, the distribution and sharding algorithms, and much more. Outside Mail.Ru the software is used by a growing number of projects in online gaming, digital marketing, and social media industries. Although Mail.Ru is the sponsor for product development, the roadmap and the bugs database and the development process are fully open. The software incorporates patches from dozens of community contributors. The Tarantool community writes and maintains most of the drivers for programming languages. The greater Lua community has hundreds of useful packages most of which can become Tarantool extensions.

Users can create, modify and drop Lua functions at runtime. Or they can define Lua programs that are loaded during startup for triggers, background tasks, and interacting with networked peers. Unlike popular application development frameworks based on a “reactor” pattern, networking in server-side Lua is sequential, yet very efficient, as it is built on top of the cooperative multitasking environment that Tarantool itself uses.

One of the built-in Lua packages provides an API for the Database Management System. Thus some developers see Tarantool as a DBMS with a popular stored procedure language, while others see it as a Lua interpreter, while still others see it as a replacement for many components of multi-tier Web applications. Performance can be a few hundred thousand transactions per second on a laptop, scalable upwards or outwards to server farms.

Database features

Tarantool can run without it, but “The Box” – the DBMS server – is a strong distinguishing feature.

The database API allows for permanently storing Lua objects, managing object collections, creating or dropping secondary keys, making changes atomically, configuring and monitoring replication, performing controlled fail-over, and executing Lua code triggered by database events. Remote database instances are accessible transparently via a remote-procedure-invocation API.

Tarantool’s DBMS server uses the storage engine concept, where different sets of algorithms and data structures can be used for different situations. Two storage engines are built-in: an in-memory engine which has all the data and indexes in RAM, and a two-level B-tree engine for data sets whose size is 10 to 1000 times the amount of available RAM. All storage engines in Tarantool support transactions and replication by using a common write ahead log (WAL). This ensures consistency and crash safety of the persistent state. Changes are not considered complete until the WAL is written. The logging subsystem supports group commit.

Tarantool’s in-memory storage engine (memtx) keeps all the data in random-access memory, and therefore has very low read latency. It also keeps persistent copies of the data in non-volatile storage, such as disk, when users request “snapshots”. If an instance of the server stops and the random-access memory is lost, then restarts, it reads the latest snapshot and then replays the transactions that are in the log – therefore no data is lost.

Tarantool’s in-memory engine is lock-free in typical situations. Instead of the operating system’s concurrency primitives, such as mutexes, Tarantool uses cooperative multitasking to handle thousands of connections simultaneously. There is a fixed number of independent execution threads. The threads do not share state. Instead they exchange data using low-overhead message queues. While this approach limits the number of cores that the instance will use, it removes competition for the memory bus and ensures peak scalability of memory access and network throughput. CPU utilization of a typical highly-loaded Tarantool instance is under 10%. Searches are possible via secondary index keys as well as primary keys.

Tarantool’s disk-based storage engine is a fusion of ideas from modern filesystems, log-structured merge trees and classical B-trees. All data is organized into ranges. Each range is represented by a file on disk. Range size is a configuration option and normally is around 64MB. Each range is a collection of pages, serving different purposes. Pages in a fully merged range contain non-overlapping ranges of keys. A range can be partially merged if there were a lot of changes in its key range recently. In that case some pages represent new keys and values in the range. The disk-based storage engine is append only: new data never overwrites old data. The disk-based storage engine is named vinyl.

Tarantool supports multi-part index keys. The possible index types are HASH, TREE, BITSET, and RTREE.

Tarantool supports asynchronous replication, locally or to remote hosts. The replication architecture can be master-master, that is, many nodes may both handle the loads and receive what others have handled, for the same data sets.

Tarantool supports basic SQL structures and persistence for SQL operations (with acceptable limitations). All tables and triggers created in SQL are available after server restart.

Getting started

In this chapter, we show how to work with Tarantool as a DBMS – and how to connect to a Tarantool database from other programming languages.

Creating your first Tarantool database

First thing, let’s install Tarantool, start it, and create a simple database.

You can install Tarantool and work with it locally or in Docker.

Using a Docker image

For trial and test purposes, we recommend using official Tarantool images for Docker. An official image contains a particular Tarantool version and all popular external modules for Tarantool. Everything is already installed and configured in Linux. These images are the easiest way to install and use Tarantool.

Note

If you’re new to Docker, we recommend going over this tutorial before proceeding with this chapter.

Launching a container

If you don’t have Docker installed, please follow the official installation guide for your OS.

To start a fully functional Tarantool instance, run a container with minimal options:

$ docker run \
  --name mytarantool \
  -d -p 3301:3301 \
  -v /data/dir/on/host:/var/lib/tarantool \
  tarantool/tarantool:2.5.1

This command runs a new container named mytarantool. Docker starts it from an official image named tarantool/tarantool:2.5.1, with Tarantool version 2.5.1 and all external modules already installed.

Tarantool will be accepting incoming connections on localhost:3301. You may start using it as a key-value storage right away.

Tarantool persists data inside the container. To make your test data available after you stop the container, this command also mounts the host’s directory /data/dir/on/host (you need to specify here an absolute path to an existing local directory) in the container’s directory /var/lib/tarantool (by convention, Tarantool in a container uses this directory to persist data). So, all changes made in the mounted directory on the container’s side are applied to the host’s disk.

Tarantool’s database module in the container is already configured and started. You needn’t do it manually, unless you use Tarantool as an application server and run it with an application.

Attaching to Tarantool

To attach to Tarantool that runs inside the container, say:

$ docker exec -i -t mytarantool console

This command:

  • Instructs Tarantool to open an interactive console port for incoming connections.
  • Attaches to the Tarantool server inside the container under admin user via a standard Unix socket.

Tarantool displays a prompt:

tarantool.sock>

Now you can enter requests on the command line.

Note

On production machines, Tarantool’s interactive mode is for system administration only. But we use it for most examples in this manual, because the interactive mode is convenient for learning.

Creating a database

While you’re attached to the console, let’s create a simple test database.

First, create the first space (named tester):

tarantool.sock> s = box.schema.space.create('tester')

Format the created space by specifying field names and types:

tarantool.sock> s:format({
              > {name = 'id', type = 'unsigned'},
              > {name = 'band_name', type = 'string'},
              > {name = 'year', type = 'unsigned'}
              > })

Create the first index (named primary):

tarantool.sock> s:create_index('primary', {
              > type = 'hash',
              > parts = {'id'}
              > })

This is a primary index based on the id field of each tuple.

Insert three tuples (our name for records) into the space:

tarantool.sock> s:insert{1, 'Roxette', 1986}
tarantool.sock> s:insert{2, 'Scorpions', 2015}
tarantool.sock> s:insert{3, 'Ace of Base', 1993}

To select a tuple using the primary index, say:

tarantool.sock> s:select{3}

The terminal screen now looks like this:

tarantool.sock> s = box.schema.space.create('tester')
---
...
tarantool.sock> s:format({
              > {name = 'id', type = 'unsigned'},
              > {name = 'band_name', type = 'string'},
              > {name = 'year', type = 'unsigned'}
              > })
---
...
tarantool.sock> s:create_index('primary', {
              > type = 'hash',
              > parts = {'id'}
              > })
---
- unique: true
  parts:
  - type: unsigned
    is_nullable: false
    fieldno: 1
  id: 0
  space_id: 512
  name: primary
  type: HASH
...
tarantool.sock> s:insert{1, 'Roxette', 1986}
---
- [1, 'Roxette', 1986]
...
tarantool.sock> s:insert{2, 'Scorpions', 2015}
---
- [2, 'Scorpions', 2015]
...
tarantool.sock> s:insert{3, 'Ace of Base', 1993}
---
- [3, 'Ace of Base', 1993]
...
tarantool.sock> s:select{3}
---
- - [3, 'Ace of Base', 1993]
...

To add a secondary index based on the band_name field, say:

tarantool.sock> s:create_index('secondary', {
              > type = 'hash',
              > parts = {'band_name'}
              > })

To select tuples using the secondary index, say:

tarantool.sock> s.index.secondary:select{'Scorpions'}
---
- - [2, 'Scorpions', 2015]
...

To drop an index, say:

tarantool> s.index.secondary:drop()
---
...

Stopping a container

When the testing is over, stop the container politely:

$ docker stop mytarantool

This was a temporary container, and its disk/memory data were flushed when you stopped it. But since you mounted a data directory from the host in the container, Tarantool’s data files were persisted to the host’s disk. Now if you start a new container and mount that data directory in it, Tarantool will recover all data from disk and continue working with the persisted data.

Using a binary package

For production purposes, we recommend official binary packages. You can choose from several Tarantool versions. An automatic build system creates, tests and publishes packages for every push into a corresponding branch (1.10, 2.5, etc) at Tarantool’s GitHub repository.

To download and install the package that’s appropriate for your OS, start a shell (terminal) and enter the command-line instructions provided for your OS at Tarantool’s download page.

Starting Tarantool

To start a Tarantool instance, say this:

$ # if you downloaded a binary with apt-get or yum, say this:
$ /usr/bin/tarantool
$ # if you downloaded and untarred a binary tarball to ~/tarantool, say this:
$ ~/tarantool/bin/tarantool

Tarantool starts in the interactive mode and displays a prompt:

tarantool>

Now you can enter requests on the command line.

Note

On production machines, Tarantool’s interactive mode is for system administration only. But we use it for most examples in this manual, because the interactive mode is convenient for learning.

Creating a database

Here is how to create a simple test database after installation.

  1. To let Tarantool store data in a separate place, create a new directory dedicated for tests:

    $ mkdir ~/tarantool_sandbox
    $ cd ~/tarantool_sandbox
    

    You can delete the directory when the tests are over.

  2. Check if the default port the database instance will listen to is vacant.

    Depending on the release, during installation Tarantool may start a demonstrative global example.lua instance that listens to the 3301 port by default. The example.lua file showcases basic configuration and can be found in the /etc/tarantool/instances.enabled or /etc/tarantool/instances.available directories.

    However, we encourage you to perform the instance startup manually, so you can learn.

    Make sure the default port is vacant:

    1. To check if the demonstrative instance is running, say:

      $ lsof -i :3301
      COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
      tarantool 6851 root   12u  IPv4  40827      0t0  TCP *:3301 (LISTEN)
      
    2. If it does, kill the corresponding process. In this example:

      $ kill 6851
      
  3. To start Tarantool’s database module and make the instance accept TCP requests on port 3301, say:

    tarantool> box.cfg{listen = 3301}
    
  4. Create the first space (named tester):

    tarantool> s = box.schema.space.create('tester')
    
  5. Format the created space by specifying field names and types:

    tarantool> s:format({
             > {name = 'id', type = 'unsigned'},
             > {name = 'band_name', type = 'string'},
             > {name = 'year', type = 'unsigned'}
             > })
    
  6. Create the first index (named primary):

    tarantool> s:create_index('primary', {
             > type = 'hash',
             > parts = {'id'}
             > })
    

    This is a primary index based on the id field of each tuple.

  7. Insert three tuples (our name for records) into the space:

    tarantool> s:insert{1, 'Roxette', 1986}
    tarantool> s:insert{2, 'Scorpions', 2015}
    tarantool> s:insert{3, 'Ace of Base', 1993}
    
  8. To select a tuple using the primary index, say:

    tarantool> s:select{3}
    

    The terminal screen now looks like this:

    tarantool> s = box.schema.space.create('tester')
    ---
    ...
    tarantool> s:format({
             > {name = 'id', type = 'unsigned'},
             > {name = 'band_name', type = 'string'},
             > {name = 'year', type = 'unsigned'}
             > })
    ---
    ...
    tarantool> s:create_index('primary', {
             > type = 'hash',
             > parts = {'id'}
             > })
    ---
    - unique: true
      parts:
      - type: unsigned
        is_nullable: false
        fieldno: 1
      id: 0
      space_id: 512
      name: primary
      type: HASH
    ...
    tarantool> s:insert{1, 'Roxette', 1986}
    ---
    - [1, 'Roxette', 1986]
    ...
    tarantool> s:insert{2, 'Scorpions', 2015}
    ---
    - [2, 'Scorpions', 2015]
    ...
    tarantool> s:insert{3, 'Ace of Base', 1993}
    ---
    - [3, 'Ace of Base', 1993]
    ...
    tarantool> s:select{3}
    ---
    - - [3, 'Ace of Base', 1993]
    ...
    
  9. To add a secondary index based on the band_name field, say:

    tarantool> s:create_index('secondary', {
             > type = 'hash',
             > parts = {'band_name'}
             > })
    
  10. To select tuples using the secondary index, say:

    tarantool> s.index.secondary:select{'Scorpions'}
    ---
    - - [2, 'Scorpions', 2015]
    ...
    
  11. Now, to prepare for the example in the next section, try this:

    tarantool> box.schema.user.grant('guest', 'read,write,execute', 'universe')
    

Connecting remotely

In the request box.cfg{listen = 3301} that we made earlier, the listen value can be any form of a URI (uniform resource identifier). In this case, it’s just a local port: port 3301. You can send requests to the listen URI via:

  1. telnet,
  2. a connector,
  3. another instance of Tarantool (using the console module), or
  4. tarantoolctl utility.

Let’s try (4).

Switch to another terminal. On Linux, for example, this means starting another instance of a Bash shell. You can switch to any working directory in the new terminal, not necessarily to ~/tarantool_sandbox.

Start the tarantoolctl utility:

$ tarantoolctl connect '3301'

This means “use tarantoolctl connect to connect to the Tarantool instance that’s listening on localhost:3301”.

Try this request:

localhost:3301> box.space.tester:select{2}

This means “send a request to that Tarantool instance, and display the result”. The result in this case is one of the tuples that was inserted earlier. Your terminal screen should now look like this:

$ tarantoolctl connect 3301
/usr/local/bin/tarantoolctl: connected to localhost:3301
localhost:3301> box.space.tester:select{2}
---
- - [2, 'Scorpions', 2015]
...

You can repeat box.space...:insert{} and box.space...:select{} indefinitely, on either Tarantool instance.

When the testing is over:

  • To drop the space: s:drop()
  • To stop tarantoolctl: Ctrl+C or Ctrl+D
  • To stop Tarantool (an alternative): the standard Lua function os.exit()
  • To stop Tarantool (from another terminal): sudo pkill -f tarantool
  • To destroy the test: rm -r ~/tarantool_sandbox

Connecting from your favorite language

Now that you have a Tarantool database, let’s see how to connect to it from Python, PHP and Go.

Connecting from Python

Pre-requisites

Before we proceed:

  1. Install the tarantool-python library. We recommend using python3 and pip3.

  2. Start Tarantool (locally or in Docker) and make sure that you have created and populated a database as we suggested earlier:

    box.cfg{listen = 3301}
    s = box.schema.space.create('tester')
    s:format({
             {name = 'id', type = 'unsigned'},
             {name = 'band_name', type = 'string'},
             {name = 'year', type = 'unsigned'}
             })
    s:create_index('primary', {
             type = 'hash',
             parts = {'id'}
             })
    s:create_index('secondary', {
             type = 'hash',
             parts = {'band_name'}
             })
    s:insert{1, 'Roxette', 1986}
    s:insert{2, 'Scorpions', 2015}
    s:insert{3, 'Ace of Base', 1993}
    

    Important

    Please do not close the terminal window where Tarantool is running – you’ll need it soon.

  3. In order to connect to Tarantool as an administrator, reset the password for the admin user:

    box.schema.user.passwd('pass')
    

Connecting to Tarantool

To get connected to the Tarantool server, say this:

>>> import tarantool
>>> connection = tarantool.connect("localhost", 3301)

You can also specify the user name and password, if needed:

>>> tarantool.connect("localhost", 3301, user=username, password=password)

The default user is guest.

Manipulating the data

A space is a container for tuples. To access a space as a named object, use connection.space:

>>> tester = connection.space('tester')
Inserting data

To insert a tuple into a space, use insert:

>>> tester.insert((4, 'ABBA', 1972))
[4, 'ABBA', 1972]
Querying data

Let’s start with selecting a tuple by the primary key (in our example, this is the index named primary, based on the id field of each tuple). Use select:

>>> tester.select(4)
[4, 'ABBA', 1972]

Next, select tuples by a secondary key. For this purpose, you need to specify the number or name of the index.

First off, select tuples using the index number:

>>> tester.select('Scorpions', index=1)
[2, 'Scorpions', 2015]

(We say index=1 because index numbers in Tarantool start with 0, and we’re using our second index here.)

Now make a similar query by the index name and make sure that the result is the same:

>>> tester.select('Scorpions', index='secondary')
[2, 'Scorpions', 2015]

Finally, select all the tuples in a space via a select with no arguments:

>>> tester.select()
Updating data

Update a field value using update:

>>> tester.update(4, [('=', 1, 'New group'), ('+', 2, 2)])

This updates the value of field 1 and increases the value of field 2 in the tuple with id = 4. If a tuple with this id doesn’t exist, Tarantool will return an error.

Now use replace to totally replace the tuple that matches the primary key. If a tuple with this primary key doesn’t exist, Tarantool will do nothing.

>>> tester.replace((4, 'New band', 2015))

You can also update the data using upsert that works similarly to update, but creates a new tuple if the old one was not found.

>>> tester.upsert((4, 'Another band', 2000), [('+', 2, 5)])

This increases by 5 the value of field 2 in the tuple with id = 4, – or inserts the tuple (4, "Another band", 2000) if a tuple with this id doesn’t exist.

Deleting data

To delete a tuple, use delete(primary_key):

>>> tester.delete(4)
[4, 'New group', 2012]

To delete all tuples in a space (or to delete an entire space), use call. We’ll focus on this function in more detail in the next section.

To delete all tuples in a space, call space:truncate:

>>> connection.call('box.space.tester:truncate', ())

To delete an entire space, call space:drop. This requires connecting to Tarantool as the admin user:

>>> connection.call('box.space.tester:drop', ())

Executing stored procedures

Switch to the terminal window where Tarantool is running.

Note

If you don’t have a terminal window with remote connection to Tarantool, check out these guides:

Define a simple Lua function:

function sum(a, b)
    return a + b
end

Now we have a Lua function defined in Tarantool. To invoke this function from python, use call:

>>> connection.call('sum', (3, 2))
5

To send bare Lua code for execution, use eval:

>>> connection.eval('return 4 + 5')
9

Connecting from PHP

Pre-requisites

Before we proceed:

  1. Install the tarantool/client library.

  2. Start Tarantool (locally or in Docker) and make sure that you have created and populated a database as we suggested earlier:

    box.cfg{listen = 3301}
    s = box.schema.space.create('tester')
    s:format({
             {name = 'id', type = 'unsigned'},
             {name = 'band_name', type = 'string'},
             {name = 'year', type = 'unsigned'}
             })
    s:create_index('primary', {
             type = 'hash',
             parts = {'id'}
             })
    s:create_index('secondary', {
             type = 'hash',
             parts = {'band_name'}
             })
    s:insert{1, 'Roxette', 1986}
    s:insert{2, 'Scorpions', 2015}
    s:insert{3, 'Ace of Base', 1993}
    

    Important

    Please do not close the terminal window where Tarantool is running – you’ll need it soon.

  3. In order to connect to Tarantool as an administrator, reset the password for the admin user:

    box.schema.user.passwd('pass')
    

Connecting to Tarantool

To get connected to the Tarantool server, say this:

use Tarantool\Client\Client;

require __DIR__.'/vendor/autoload.php';
$client = Client::fromDefaults();

You can also specify the user name and password, if needed:

$client = Client::fromOptions([
    'uri' => 'tcp://127.0.0.1:3301',
    'username' => '<username>',
    'password' => '<password>'
]);

The default user is guest.

Manipulating the data

A space is a container for tuples. To access a space as a named object, use getSpace:

$tester = $client->getSpace('tester');
Inserting data

To insert a tuple into a space, use insert:

$result = $tester->insert([4, 'ABBA', 1972]);
Querying data

Let’s start with selecting a tuple by the primary key (in our example, this is the index named primary, based on the id field of each tuple). Use select:

use Tarantool\Client\Schema\Criteria;

$result = $tester->select(Criteria::key([4]));
printf(json_encode($result));
[[4, 'ABBA', 1972]]

Next, select tuples by a secondary key. For this purpose, you need to specify the number or name of the index.

First off, select tuples using the index number:

$result = $tester->select(Criteria::index(1)->andKey(['Scorpions']));
printf(json_encode($result));
[2, 'Scorpions', 2015]

(We say index(1) because index numbers in Tarantool start with 0, and we’re using our second index here.)

Now make a similar query by the index name and make sure that the result is the same:

$result = $tester->select(Criteria::index('secondary')->andKey(['Scorpions']));
printf(json_encode($result));
[2, 'Scorpions', 2015]

Finally, select all the tuples in a space via a select:

$result = $tester->select(Criteria::allIterator());
Updating data

Update a field value using update:

use Tarantool\Client\Schema\Operations;

$result = $tester->update([4], Operations::set(1, 'New group')->andAdd(2, 2));

This updates the value of field 1 and increases the value of field 2 in the tuple with id = 4. If a tuple with this id doesn’t exist, Tarantool will return an error.

Now use replace to totally replace the tuple that matches the primary key. If a tuple with this primary key doesn’t exist, Tarantool will do nothing.

$result = $tester->replace([4, 'New band', 2015]);

You can also update the data using upsert that works similarly to update, but creates a new tuple if the old one was not found.

use Tarantool\Client\Schema\Operations;

$tester->upsert([4, 'Another band', 2000], Operations::add(2, 5));

This increases by 5 the value of field 2 in the tuple with id = 4, – or inserts the tuple (4, "Another band", 2000) if a tuple with this id doesn’t exist.

Deleting data

To delete a tuple, use delete(primary_key):

$result = $tester->delete([4]);

To delete all tuples in a space (or to delete an entire space), use call. We’ll focus on this function in more detail in the next section.

To delete all tuples in a space, call space:truncate:

$result = $client->call('box.space.tester:truncate');

To delete an entire space, call space:drop. This requires connecting to Tarantool as the admin user:

$result = $client->call('box.space.tester:drop');

Executing stored procedures

Switch to the terminal window where Tarantool is running.

Note

If you don’t have a terminal window with remote connection to Tarantool, check out these guides:

Define a simple Lua function:

function sum(a, b)
    return a + b
end

Now we have a Lua function defined in Tarantool. To invoke this function from php, use call:

$result = $client->call('sum', 3, 2);

To send bare Lua code for execution, use eval:

$result = $client->evaluate('return 4 + 5');

Connecting from Go

Pre-requisites

Before we proceed:

  1. Install the go-tarantool library.

  2. Start Tarantool (locally or in Docker) and make sure that you have created and populated a database as we suggested earlier:

    box.cfg{listen = 3301}
    s = box.schema.space.create('tester')
    s:format({
             {name = 'id', type = 'unsigned'},
             {name = 'band_name', type = 'string'},
             {name = 'year', type = 'unsigned'}
             })
    s:create_index('primary', {
             type = 'hash',
             parts = {'id'}
             })
    s:create_index('secondary', {
             type = 'hash',
             parts = {'band_name'}
             })
    s:insert{1, 'Roxette', 1986}
    s:insert{2, 'Scorpions', 2015}
    s:insert{3, 'Ace of Base', 1993}
    

    Important

    Please do not close the terminal window where Tarantool is running – you’ll need it soon.

  3. In order to connect to Tarantool as an administrator, reset the password for the admin user:

    box.schema.user.passwd('pass')
    

Connecting to Tarantool

To get connected to the Tarantool server, write a simple Go program:

package main

import (
    "fmt"

    "github.com/tarantool/go-tarantool"
)

func main() {
    conn, err := tarantool.Connect("127.0.0.1:3301", tarantool.Opts{})

    if err != nil {
            fmt.Println("Connection refused")
    }

    defer conn.Close()

    // Your logic for interacting with the database
}

You can also specify the user name and password, if needed:

opts := tarantool.Opts{User: "username", Pass: "password"}
conn, err := tarantool.Connect("127.0.0.1:3301", opts)
...

The default user is guest.

Manipulating the data

Inserting data

To insert a tuple into a space, use Insert:

resp, err = conn.Insert("tester", []interface{}{4, "ABBA", 1972})

This inserts the tuple (4, "ABBA", 1972) into a space named tester.

The response code and data are available in the tarantool.Response structure:

code := resp.Code
data := resp.Data
Querying data

To select a tuple from a space, use Select:

resp, err = conn.Select("tester", "primary", 0, 1, tarantool.IterEq, []interface{}{4})

This selects a tuple by the primary key with offset = 0 and limit = 1 from a space named tester (in our example, this is the index named primary, based on the id field of each tuple).

Next, select tuples by a secondary key.

resp, err = conn.Select("tester", "secondary", 0, 1, tarantool.IterEq, []interface{}{"ABBA"})

Finally, select all the tuples in a space:

resp, err = conn.Select("tester", "primary", 0, tarantool.KeyLimit, tarantool.IterAll, []interface{}{})

For more examples, see https://github.com/tarantool/go-tarantool#usage

Updating data

Update a field value using Update:

resp, err = conn.Update("tester", "primary", []interface{}{4}, []interface{}{[]interface{}{"+", 2, 3}})

This increases by 3 the value of field 2 in the tuple with id = 4. If a tuple with this id doesn’t exist, Tarantool will return an error.

Now use Replace to totally replace the tuple that matches the primary key. If a tuple with this primary key doesn’t exist, Tarantool will do nothing.

resp, err = conn.Replace("tester", []interface{}{4, "New band", 2011})

You can also update the data using Upsert that works similarly to Update, but creates a new tuple if the old one was not found.

resp, err = conn.Upsert("tester", []interface{}{4, "Another band", 2000}, []interface{}{[]interface{}{"+", 2, 5}})

This increases by 5 the value of the third field in the tuple with id = 4, – or inserts the tuple (4, "Another band", 2000) if a tuple with this id doesn’t exist.

Deleting data

To delete a tuple, use сonnection.Delete:

resp, err = conn.Delete("tester", "primary", []interface{}{4})

To delete all tuples in a space (or to delete an entire space), use Call. We’ll focus on this function in more detail in the next section.

To delete all tuples in a space, call space:truncate:

resp, err = conn.Call("box.space.tester:truncate", []interface{}{})

To delete an entire space, call space:drop. This requires connecting to Tarantool as the admin user:

resp, err = conn.Call("box.space.tester:drop", []interface{}{})

Executing stored procedures

Switch to the terminal window where Tarantool is running.

Note

If you don’t have a terminal window with remote connection to Tarantool, check out these guides:

Define a simple Lua function:

function sum(a, b)
    return a + b
end

Now we have a Lua function defined in Tarantool. To invoke this function from go, use Call:

resp, err = conn.Call("sum", []interface{}{2, 3})

To send bare Lua code for execution, use Eval:

resp, err = connection.Eval("return 4 + 5", []interface{}{})

Creating your first Tarantool Cartridge application

Here we’ll walk you through developing a simple cluster application.

First, set up the development environment.

Next, create an application named myapp. Say:

$ cartridge create --name myapp

This will create a Tarantool Cartridge application in the ./myapp directory, with a handful of template files and directories inside.

Go inside and make a dry run:

$ cd ./myapp
$ cartridge build
$ cartridge start

This will build the application locally, start 5 instances of Tarantool, and run the application as it is, with no business logic yet.

Why 5 instances? See the instances.yml file in your application directory. It contains the configuration of all instances that you can use in the cluster. By default, it defines configuration for 5 Tarantool instances.

myapp.router:
  workdir: ./tmp/db_dev/3301
  advertise_uri: localhost:3301
  http_port: 8081

myapp.s1-master:
  workdir: ./tmp/db_dev/3302
  advertise_uri: localhost:3302
  http_port: 8082

myapp.s1-replica:
  workdir: ./tmp/db_dev/3303
  advertise_uri: localhost:3303
  http_port: 8083

myapp.s2-master:
  workdir: ./tmp/db_dev/3304
  advertise_uri: localhost:3304
  http_port: 8084

myapp.s2-replica:
  workdir: ./tmp/db_dev/3305
  advertise_uri: localhost:3305
  http_port: 8085

You can already see these instances in the cluster management web interface at http://localhost:8081 (here 8081 is the HTTP port of the first instance specified in instances.yml).

_images/cluster_dry_run-border-5px.png

Okay, press Ctrl + C to stop the cluster for a while.

Now it’s time to add some business logic to your application. This will be an evergreen “Hello world!”” – just to keep things simple.

Rename the template file app/roles/custom.lua to hello-world.lua.

$ mv app/roles/custom.lua app/roles/hello-world.lua

This will be your role. In Tarantool Cartridge, a role is a Lua module that implements some instance-specific functions and/or logic. Further on we’ll show how to add code to a role, build it, enable and test.

There is already some code in the role’s init() function.

 local function init(opts) -- luacheck: no unused args
     -- if opts.is_master then
     -- end

     local httpd = cartridge.service_get('httpd')
     httpd:route({method = 'GET', path = '/hello'}, function()
         return {body = 'Hello world!'}
     end)

     return true
 end

This exports an HTTP endpoint /hello. For example, http://localhost:8081/hello if you address the first instance from the instances.yml file. If you open it in a browser after enabling the role (we’ll do it here a bit later), you’ll see “Hello world!” on the page.

Let’s add some more code there.

 local function init(opts) -- luacheck: no unused args
     -- if opts.is_master then
     -- end

     local httpd = cartridge.service_get('httpd')
     httpd:route({method = 'GET', path = '/hello'}, function()
         return {body = 'Hello world!'}
     end)

     local log = require('log')
     log.info('Hello world!')

     return true
 end

This writes “Hello, world!” to the console when the role gets enabled, so you’ll have a chance to spot this. No rocket science.

Next, amend role_name in the “return” section of the hello-world.lua file. This text will be displayed as a label for your role in the cluster management web interface.

 return {
     role_name = 'Hello world!',
     init = init,
     stop = stop,
     validate_config = validate_config,
     apply_config = apply_config,
 }

The final thing to do before you can run the application is to add your role to the list of available cluster roles in the init.lua file.

 local ok, err = cartridge.cfg({
     workdir = 'tmp/db',
     roles = {
         'cartridge.roles.vshard-storage',
         'cartridge.roles.vshard-router',
         'app.roles.hello-world'
     },
     cluster_cookie = 'myapp-cluster-cookie',
 })

Now the cluster will be aware of your role.

Why app.roles.hello-world? By default, the role name here should match the path from the application root (./myapp) to the role file (app/roles/hello-world.lua).

Fine! Your role is ready. Re-build the application and re-start the cluster now:

$ cartridge build
$ cartridge start

Now all instances are up, but idle, waiting for you to enable roles for them.

Instances (replicas) in a Tarantool Cartridge cluster are organized into replica sets. Roles are enabled per replica set, so all instances in a replica set have the same roles enabled.

Let’s create a replica set containing just one instance and enable your role:

  1. Open the cluster management web interface at http://localhost:8081.

  2. Click Configure.

  3. Check the role Hello world! to enable it. Notice that the role name here matches the label text that you specified in the role_name parameter in the hello-world.lua file.

  4. (Optionally) Specify the replica set name, for example “hello-world-replica-set”.

    _images/cluster_create_replica_set-border-5px.png
  5. Click Create replica set and see the newly-created replica set in the web interface.

_images/cluster_new_replica_set-border-5px.png

Your custom role got enabled. Find the “Hello world!” message in console, like this:

_images/cluster_hello_world_console-border-5px.png

Finally, open the HTTP endpoint of this instance at http://localhost:8081/hello and see the reply to your GET request.

_images/cluster_hello_http-border-5px.png

Everything is up and running! What’s next?

  • Follow this guide to set up the rest of the cluster and try some cool cluster management features.
  • Get inspired with these examples and implement more sophisticated business logic for your role.
  • Pack your application for easy distribution. Choose what you like: a DEB or RPM package, a TGZ archive, or a Docker image.

User’s Guide

Preface

Welcome to Tarantool! This is the User’s Guide. We recommend reading it first, and consulting Reference materials for more detail afterwards, if needed.

How to read the documentation

To get started, you can install and launch Tarantool using a Docker container, a binary package, or the online Tarantool server at http://try.tarantool.org. Either way, as the first tryout, you can follow the introductory exercises from Chapter 2 “Getting started”. If you want more hands-on experience, proceed to Tutorials after you are through with Chapter 2.

Chapter 3 “Database” is about using Tarantool as a NoSQL DBMS, whereas Chapter 4 “Application server” is about using Tarantool as an application server.

Chapter 5 “Server administration” and Chapter 6 “Replication” are primarily for administrators.

Chapter 7 “Connectors” is strictly for users who are connecting from a different language such as C or Perl or Python — other users will find no immediate need for this chapter.

Chapter 8 “FAQ” gives answers to some frequently asked questions about Tarantool.

For experienced users, there are also Reference materials, a Contributor’s Guide and an extensive set of comments in the source code.

Getting in touch with the Tarantool community

Please report bugs or make feature requests at http://github.com/tarantool/tarantool/issues.

You can contact developers directly in telegram or in a Tarantool discussion group (English or Russian).

Conventions used in this manual

Square brackets [ and ] enclose optional syntax.

Two dots in a row .. mean the preceding tokens may be repeated.

A vertical bar | means the preceding and following tokens are mutually exclusive alternatives.

Database

In this chapter, we introduce the basic concepts of working with Tarantool as a database manager.

This chapter contains the following sections:

Data model

This section describes how Tarantool stores values and what operations with data it supports.

If you tried to create a database as suggested in our “Getting started” exercises, then your test database now looks like this:

_images/data_model.png

Space

A space – ‘tester’ in our example – is a container.

When Tarantool is being used to store data, there is always at least one space. Each space has a unique name specified by the user. Besides, each space has a unique numeric identifier which can be specified by the user, but usually is assigned automatically by Tarantool. Finally, a space always has an engine: memtx (default) – in-memory engine, fast but limited in size, or vinyl – on-disk engine for huge data sets.

A space is a container for tuples. To be functional, it needs to have a primary index. It can also have secondary indexes.

Tuple

A tuple plays the same role as a “row” or a “record”, and the components of a tuple (which we call “fields”) play the same role as a “row column” or “record field”, except that:

  • fields can be composite structures, such as arrays or maps, and
  • fields don’t need to have names.

Any given tuple may have any number of fields, and the fields may be of different types. The identifier of a field is the field’s number, base 1 (in Lua and other 1-based languages) or base 0 (in PHP or C/C++). For example, 1 or 0 can be used in some contexts to refer to the first field of a tuple.

The number of tuples in a space is unlimited.

Tuples in Tarantool are stored as MsgPack arrays.

When Tarantool returns a tuple value in the console, by default it uses YAML format, for example: [3, 'Ace of Base', 1993].

Index

An index is a group of key values and pointers.

As with spaces, you should specify the index name, and let Tarantool come up with a unique numeric identifier (“index id”).

An index always has a type. The default index type is ‘TREE’. TREE indexes are provided by all Tarantool engines, can index unique and non-unique values, support partial key searches, comparisons and ordered results. Additionally, memtx engine supports HASH, RTREE and BITSET indexes.

An index may be multi-part, that is, you can declare that an index key value is composed of two or more fields in the tuple, in any order. For example, for an ordinary TREE index, the maximum number of parts is 255.

An index may be unique, that is, you can declare that it would be illegal to have the same key value twice.

The first index defined on a space is called the primary key index, and it must be unique. All other indexes are called secondary indexes, and they may be non-unique.

An index definition may include identifiers of tuple fields and their expected types (see allowed indexed field types below).

Note

A recommended design pattern for a data model is to base primary keys on the first fields of a tuple, because this speeds up tuple comparison.

In our example, we first defined the primary index (named ‘primary’) based on field #1 of each tuple:

tarantool> i = s:create_index('primary', {type = 'hash', parts = {{field = 1, type = 'unsigned'}}}

The effect is that, for all tuples in space ‘tester’, field #1 must exist and must contain an unsigned integer. The index type is ‘hash’, so values in field #1 must be unique, because keys in HASH indexes are unique.

After that, we defined a secondary index (named ‘secondary’) based on field #2 of each tuple:

tarantool> i = s:create_index('secondary', {type = 'tree', parts = {field = 2, type = 'string'}})

The effect is that, for all tuples in space ‘tester’, field #2 must exist and must contain a string. The index type is ‘tree’, so values in field #2 must not be unique, because keys in TREE indexes may be non-unique.

Note

Space definitions and index definitions are stored permanently in Tarantool’s system spaces _space and _index (for details, see reference on box.space submodule).

You can add, drop, or alter the definitions at runtime, with some restrictions. See syntax details in reference on box module.

Read more about index operations below.

Data types

Tarantool is both a database and an application server. Hence a developer often deals with two type sets: the programming language types (e.g. Lua) and the types of the Tarantool storage format (MsgPack).

Lua vs MsgPack
Scalar / compound MsgPack   type Lua type Example value
scalar nil nil msgpack.NULL
scalar boolean boolean true
scalar string string ‘A B C’
scalar integer number 12345
scalar double number 1.2345
scalar double cdata 1.2345
scalar bin cdata [!!binary 3t7e]
scalar decimal cdata 1.2
scalar uuid cdata 12a34b5c-de67-8f90-123g- h4567ab8901
scalar ext (converted to exact number) 1.2
compound map table” (with string keys) {‘a’: 5, ‘b’: 6}
compound array table” (with integer keys) [1, 2, 3, 4, 5]
compound array tuple (“cdata”) [12345, ‘A B C’]

In Lua, a nil type has only one possible value, also called nil (displayed as null on Tarantool’s command line, since the output is in the YAML format). Nils may be compared to values of any types with == (is-equal) or ~= (is-not-equal), but other operations will not work. Nils may not be used in Lua tables; the workaround is to use msgpack.NULL

A boolean is either true or false.

A string is a variable-length sequence of bytes, usually represented with alphanumeric characters inside single quotes. In both Lua and MsgPack, strings are treated as binary data, with no attempts to determine a string’s character set or to perform any string conversion – unless there is an optional collation. So, usually, string sorting and comparison are done byte-by-byte, without any special collation rules applied. (Example: numbers are ordered by their point on the number line, so 2345 is greater than 500; meanwhile, strings are ordered by the encoding of the first byte, then the encoding of the second byte, and so on, so ‘2345’ is less than ‘500’.)

In Lua, a number is double-precision floating-point, but Tarantool ‘number’ may have both integer and floating-point values. Tarantool will try to store a Lua number as floating-point if the value contains a decimal point or is very large (greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer. To ensure that even very large numbers are stored as integers, use the tonumber64 function, or the LL (Long Long) suffix, or the ULL (Unsigned Long Long) suffix. Here are examples of numbers using regular notation, exponential notation, the ULL suffix and the tonumber64 function: -55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').

The Tarantool/SQL double field type exists mainly so that there will be an equivalent to Tarantool/SQL’s DOUBLE data type. In MsgPack the storage type is MP_DOUBLE and the size of the encoded value is always 9 bytes. In Lua, ‘double’ fields can only contain non-integer numeric values and cdata values with double floating-point numbers. To avoid using the wrong kind of values inadvertently, use ffi.cast() when searching or changing ‘double’ fields. For example, instead of space_object:insert { value } say ffi = require('ffi') ... space_object:insert ({ffi.cast('double', value )}). Example:

s = box.schema.space.create('s', {format = {{'d', 'double'}}})
s:create_index('ii')
s:insert({1.1})
ffi = require('ffi')
s:insert({ffi.cast('double', 1)})
s:insert({ffi.cast('double', tonumber('123'))})
s:select(1.1)
s:select({ffi.cast('double', 1)})

Arithmetic with cdata ‘double’ will not work reliably, so for Lua it is better to use the ‘number’ type. This warning does not apply for Tarantool/SQL because Tarantool/SQL does implicit casting.

An ext (extension) value is an addition by Tarantool, not part of the formal MsgPack definition, for storage of decimal values. Values with the decimal type are not floating-point values although they may contain decimal points. They are exact.

A bin (binary) value is not directly supported by Lua but there is a Tarantool type VARBINARY which is encoded as MessagePack binary. For an (advanced) example showing how to insert VARBINARY into a database, see the Cookbook Recipe for ffi_varbinary_insert.

Lua tables with string keys are stored as MsgPack maps; Lua tables with integer keys starting with 1 – as MsgPack arrays. Nils may not be used in Lua tables; the workaround is to use msgpack.NULL

A tuple is a light reference to a MsgPack array stored in the database. It is a special type (cdata) to avoid conversion to a Lua table on retrieval. A few functions may return tables with multiple tuples. For more tuple examples, see box.tuple.

Note

Tarantool uses the MsgPack format for database storage, which is variable-length. So, for example, the smallest number requires only one byte, but the largest number requires nine bytes.

Examples of insert requests with different data types:

tarantool> box.space.K:insert{1,nil,true,'A B C',12345,1.2345}
---
- [1, null, true, 'A B C', 12345, 1.2345]
...
tarantool> box.space.K:insert{2,{['a']=5,['b']=6}}
---
- [2, {'a': 5, 'b': 6}]
...
tarantool> box.space.K:insert{3,{1,2,3,4,5}}
---
- [3, [1, 2, 3, 4, 5]]
...
Indexed field types

Indexes restrict values which Tarantool’s MsgPack may contain. This is why, for example, ‘unsigned’ is a separate indexed field type, compared to ‘integer’ data type in MsgPack: they both store ‘integer’ values, but an ‘unsigned’ index contains only non-negative integer values and an ‘integer’ index contains all integer values.

Here is how Tarantool indexed field types correspond to MsgPack data types.

Indexed field type MsgPack data type
(and possible values)
Index type Examples
unsigned (may also be called ‘uint’ or ‘num’, but ‘num’ is deprecated) integer (integer between 0 and 18446744073709551615, i.e. about 18 quintillion) TREE, BITSET or HASH 123456
integer (may also be called ‘int’) integer (integer between -9223372036854775808 and 18446744073709551615) TREE or HASH -2^63
number

integer (integer between -9223372036854775808 and 18446744073709551615)

double (single-precision floating point number or double-precision floating point number)

TREE or HASH

1.234

-44

1.447e+44

double double TREE or HASH 1.234
string (may also be called ‘str’) string (any set of octets, up to the maximum length) TREE, BITSET or HASH

‘A B C’

‘\65 \66 \67’

varbinary bin (any set of octets, up to the maximum length) TREE or HASH ‘\65 \66 \67’
boolean bool (true or false) TREE or HASH true
decimal ext (extension) TREE or HASH 1.2
uuid ext (extension) TREE or HASH 64d22e4d-ac92-
4a23-899a-
e5934af5479
array array (list of numbers representing points in a geometric figure) RTREE

{10, 11}

{3, 5, 9, 10}

scalar

null

bool (true or false)

integer (integer between -9223372036854775808 and 18446744073709551615)

double (single-precision floating point number or double-precision floating point number)

decimal (value returned by a function in the decimal module

string (any set of octets)

varbinary (any set of octets)

Note: When there is a mix of types, the key order is: null, then booleans, then numbers, then strings, then varbinary.

TREE or HASH

msgpack.NULL

true

-1

1.234

‘’

‘ру’

Collations

By default, when Tarantool compares strings, it uses what we call a “binary” collation. The only consideration here is the numeric value of each byte in the string. Therefore, if the string is encoded with ASCII or UTF-8, then 'A' < 'B' < 'a', because the encoding of ‘A’ (what used to be called the “ASCII value”) is 65, the encoding of ‘B’ is 66, and the encoding of ‘a’ is 98. Binary collation is best if you prefer fast deterministic simple maintenance and searching with Tarantool indexes.

But if you want the ordering that you see in phone books and dictionaries, then you need Tarantool’s optional collations, such as unicode and unicode_ci, which allow for 'a' < 'A' < 'B' and 'a' = 'A' < 'B' respectively.

The unicode and unicode_ci optional collations use the ordering according to the Default Unicode Collation Element Table (DUCET) and the rules described in Unicode® Technical Standard #10 Unicode Collation Algorithm (UTS #10 UCA). The only difference between the two collations is about weights:

  • unicode collation observes L1 and L2 and L3 weights (strength = ‘tertiary’),
  • unicode_ci collation observes only L1 weights (strength = ‘primary’), so for example ‘a’ = ‘A’ = ‘á’ = ‘Á’.

As an example, take some Russian words:

'ЕЛЕ'
'елейный'
'ёлка'
'еловый'
'елозить'
'Ёлочка'
'ёлочный'
'ЕЛь'
'ель'

…and show the difference in ordering and selecting by index:

  • with unicode collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type = 'str', collation='unicode'}}})
    ...
    tarantool> box.space.T.index.I:select()
    ---
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ель']
      - ['ЕЛь']
    ...
    tarantool> box.space.T.index.I:select{'ЁлКа'}
    ---
    - []
    ...
    
  • with unicode_ci collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type ='str', collation='unicode_ci'}}})
    ...
    tarantool> box.space.S.index.I:select()
    ---
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ЕЛь']
    ...
    tarantool> box.space.S.index.I:select{'ЁлКа'}
    ---
    - - ['ёлка']
    ...
    

In all, collation involves much more than these simple examples of upper case / lower case and accented / unaccented equivalence in alphabets. We also consider variations of the same character, non-alphabetic writing systems, and special rules that apply for combinations of characters.

For English: use “unicode” and “unicode_ci”. For Russian: use “unicode” and “unicode_ci” (although a few Russians might prefer the Kyrgyz collation which says Cyrillic letters ‘Е’ and ‘Ё’ are the same with level-1 weights). For Dutch, German (dictionary), French, Indonesian, Irish, Italian, Lingala, Malay, Portuguese, Southern Soho, Xhosa, or Zulu: “unicode” and “unicode_ci” will do.

The tailored optional collations: For other languages, Tarantool supplies tailored collations for every modern language that has more than a million native speakers, and for specialized situations such as the difference between dictionary order and telephone book order. To see a complete list say box.space._collation:select(). The tailored collation names have the form unicode_[language code]_[strength] where language code is a standard 2-character or 3-character language abbreviation, and strength is s1 for “primary strength” (level-1 weights), s2 for “secondary”, s3 for “tertiary”. Tarantool uses the same language codes as the ones in the “list of tailorable locales” on man pages of Ubuntu and Fedora. Charts explaining the precise differences from DUCET order are in the Common Language Data Repository.

Sequences

A sequence is a generator of ordered integer values.

As with spaces and indexes, you should specify the sequence name, and let Tarantool come up with a unique numeric identifier (“sequence id”).

As well, you can specify several options when creating a new sequence. The options determine what value will be generated whenever the sequence is used.

Options for box.schema.sequence.create()
Option name Type and meaning Default Examples
start Integer. The value to generate the first time a sequence is used 1 start=0
min Integer. Values smaller than this cannot be generated 1 min=-1000
max Integer. Values larger than this cannot be generated 9223372036854775807 max=0
cycle Boolean. Whether to start again when values cannot be generated false cycle=true
cache Integer. The number of values to store in a cache 0 cache=0
step Integer. What to add to the previous generated value, when generating a new value 1 step=-1
if_not_exists Boolean. If this is true and a sequence with this name exists already, ignore other options and use the existing values false if_not_exists=true

Once a sequence exists, it can be altered, dropped, reset, forced to generate the next value, or associated with an index.

For an initial example, we generate a sequence named ‘S’.

tarantool> box.schema.sequence.create('S',{min=5, start=5})
---
- step: 1
  id: 5
  min: 5
  cache: 0
  uid: 1
  max: 9223372036854775807
  cycle: false
  name: S
  start: 5
...

The result shows that the new sequence has all default values, except for the two that were specified, min and start.

Then we get the next value, with the next() function.

tarantool> box.sequence.S:next()
---
- 5
...

The result is the same as the start value. If we called next() again, we would get 6 (because the previous value plus the step value is 6), and so on.

Then we create a new table, and say that its primary key may be generated from the sequence.

tarantool> s=box.schema.space.create('T');s:create_index('I',{sequence='S'})
---
...

Then we insert a tuple, without specifying a value for the primary key.

tarantool> box.space.T:insert{nil,'other stuff'}
---
- [6, 'other stuff']
...

The result is a new tuple where the first field has a value of 6. This arrangement, where the system automatically generates the values for a primary key, is sometimes called “auto-incrementing” or “identity”.

For syntax and implementation details, see the reference for box.schema.sequence.

Persistence

In Tarantool, updates to the database are recorded in the so-called write ahead log (WAL) files. This ensures data persistence. When a power outage occurs or the Tarantool instance is killed incidentally, the in-memory database is lost. In this situation, WAL files are used to restore the data. Namely, Tarantool reads the WAL files and redoes the requests (this is called the “recovery process”). You can change the timing of the WAL writer, or turn it off, by setting wal_mode.

Tarantool also maintains a set of snapshot files. These files contain an on-disk copy of the entire data set for a given moment. Instead of reading every WAL file since the databases were created, the recovery process can load the latest snapshot file and then read only those WAL files that were produced after the snapshot file was made. After checkpointing, old WAL files can be removed to free up space.

To force immediate creation of a snapshot file, you can use Tarantool’s box.snapshot() request. To enable automatic creation of snapshot files, you can use Tarantool’s checkpoint daemon. The checkpoint daemon sets intervals for forced checkpoints. It makes sure that the states of both memtx and vinyl storage engines are synchronized and saved to disk, and automatically removes old WAL files.

Snapshot files can be created even if there is no WAL file.

Note

The memtx engine makes only regular checkpoints with the interval set in checkpoint daemon configuration.

The vinyl engine runs checkpointing in the background at all times.

See the Internals section for more details about the WAL writer and the recovery process.

Operations

Data operations

The basic data operations supported in Tarantool are:

  • five data-manipulation operations (INSERT, UPDATE, UPSERT, DELETE, REPLACE), and
  • one data-retrieval operation (SELECT).

All of them are implemented as functions in box.space submodule.

Examples:

  • INSERT: Add a new tuple to space ‘tester’.

    The first field, field[1], will be 999 (MsgPack type is integer).

    The second field, field[2], will be ‘Taranto’ (MsgPack type is string).

    tarantool> box.space.tester:insert{999, 'Taranto'}
    
  • UPDATE: Update the tuple, changing field field[2].

    The clause “{999}”, which has the value to look up in the index of the tuple’s primary-key field, is mandatory, because update() requests must always have a clause that specifies a unique key, which in this case is field[1].

    The clause “{{‘=’, 2, ‘Tarantino’}}” specifies that assignment will happen to field[2] with the new value.

    tarantool> box.space.tester:update({999}, {{'=', 2, 'Tarantino'}})
    
  • UPSERT: Upsert the tuple, changing field field[2] again.

    The syntax of upsert() is similar to the syntax of update(). However, the execution logic of these two requests is different. UPSERT is either UPDATE or INSERT, depending on the database’s state. Also, UPSERT execution is postponed until after transaction commit, so, unlike update(), upsert() doesn’t return data back.

    tarantool> box.space.tester:upsert({999, 'Taranted'}, {{'=', 2, 'Tarantism'}})
    
  • REPLACE: Replace the tuple, adding a new field.

    This is also possible with the update() request, but the update() request is usually more complicated.

    tarantool> box.space.tester:replace{999, 'Tarantella', 'Tarantula'}
    
  • SELECT: Retrieve the tuple.

    The clause “{999}” is still mandatory, although it does not have to mention the primary key.

    tarantool> box.space.tester:select{999}
    
  • DELETE: Delete the tuple.

    In this example, we identify the primary-key field.

    tarantool> box.space.tester:delete{999}
    

Summarizing the examples:

  • Functions insert and replace accept a tuple (where a primary key comes as part of the tuple).
  • Function upsert accepts a tuple (where a primary key comes as part of the tuple), and also the update operations to execute.
  • Function delete accepts a full key of any unique index (primary or secondary).
  • Function update accepts a full key of any unique index (primary or secondary), and also the operations to execute.
  • Function select accepts any key: primary/secondary, unique/non-unique, full/partial.

See reference on box.space for more details on using data operations.

Note

Besides Lua, you can use Perl, PHP, Python or other programming language connectors. The client server protocol is open and documented. See this annotated BNF.

Index operations

Index operations are automatic: if a data-manipulation request changes a tuple, then it also changes the index keys defined for the tuple.

The simple index-creation operation that we’ve illustrated before is:

box.space.space-name:create_index('index-name')

This creates a unique TREE index on the first field of all tuples (often called “Field#1”), which is assumed to be numeric.

The simple SELECT request that we’ve illustrated before is:

box.space.space-name:select(value)

This looks for a single tuple via the first index. Since the first index is always unique, the maximum number of returned tuples will be: one. You can call select() without arguments, causing all tuples to be returned.

Let’s continue working with the space ‘tester’ created in the “Getting started” exercises:

-- Let's modify our space 'tester'
tarantool> box.space.tester:format({
         > {name = 'id', type = 'unsigned'},
         > {name = 'band_name', type = 'string'},
         > {name = 'year', type = 'unsigned'},
         > {name = 'rate', type = 'unsigned', is_nullable=true}})
---
...
-- Create an index with three parts
tarantool> box.space.tester:create_index('tertiary', {parts = {{field = 2, type = 'string'},{field=3, type='unsigned'}, {field=4, type='unsigned'}}})
---
- unique: true
  parts:
  - type: string
    is_nullable: false
    fieldno: 2
  - type: unsigned
    is_nullable: false
    fieldno: 3
  - type: unsigned
    is_nullable: true
    fieldno: 4
  id: 6
  space_id: 513
  type: TREE
  name: tertiary
...
-- And make sure it has three parts
tarantool> box.space.tester.index.tertiary.parts
---
- - type: string
    is_nullable: false
    fieldno: 2
  - type: unsigned
    is_nullable: false
    fieldno: 3
  - type: unsigned
    is_nullable: true
    fieldno: 4
...
-- Add the rate to the tuple #1
tarantool> box.space.tester:update(1, {{'=', 4, 5}})
---
- [1, 'Roxette', 1986, 5]
...
-- And insert another tuple
tarantool> box.space.tester:insert({4, 'Roxette', 2016, 5})
---
- [4, 'Roxette', 2016, 5]
...

The existing SELECT variations:

  1. The search can use comparisons other than equality.
tarantool> box.space.tester:select(1, {iterator = 'GT'})
---
- - [2, 'Scorpions', 2015]
  - [3, 'Ace of Base', 1993]
  - [4, 'Roxette', 2016, 5]
...

The comparison operators are LT, LE, EQ, REQ, GE, GT (for “less than”, “less than or equal”, “equal”, “reversed equal”, “greater than or equal”, “greater than” respectively). Comparisons make sense if and only if the index type is ‘TREE’.

This type of search may return more than one tuple; if so, the tuples will be in descending order by key when the comparison operator is LT or LE or REQ, otherwise in ascending order.

  1. The search can use a secondary index.
tarantool> box.space.tester.index.secondary:select({'Ace of Base'})
---
- - [3, 'Ace of Base', 1993]
...

For a primary-key search, it is optional to specify an index name. For a secondary-key search, it is mandatory.

  1. The search may be for some or all key parts but should contain the prefix of the key. Notice that partial key searches are available only in TREE indexes.
tarantool> box.space.tester.index.tertiary:select({'Roxette', 2016})
---
- - [1, 'Roxette', 2016, 5]
...
tarantool> box.space.tester.index.tertiary:select({'Roxette', nil, 5})
---
- - [1, 'Roxette', 1986, 5]
  - [4, 'Roxette', 2016, 5]
...
  1. The search may be for all fields, using a table for the value:
tarantool> box.space.tester.index.tertiary:select({'Roxette', 2016 ,5})
---
- - [4, 'Roxette', 2016, 5]
...

or the search can be for one field, using a table or a scalar:

tarantool> box.space.tester.index.tertiary:select('Roxette')
---
- - [1, 'Roxette', 1986, 5]
  - [4, 'Roxette', 2016, 5]
...
Working with BITSET and RTREE

BITSET example:

tarantool> box.schema.space.create('bitset_example')
tarantool> box.space.bitset_example:create_index('primary')
tarantool> box.space.bitset_example:create_index('bitset',{unique=false,type='BITSET', parts={2,'unsigned'}})
tarantool> box.space.bitset_example:insert{1,1}
tarantool> box.space.bitset_example:insert{2,4}
tarantool> box.space.bitset_example:insert{3,7}
tarantool> box.space.bitset_example:insert{4,3}
tarantool> box.space.bitset_example.index.bitset:select(2, {iterator='BITS_ANY_SET'})

The result will be:

---
- - [3, 7]
  - [4, 3]
...

because (7 AND 2) is not equal to 0, and (3 AND 2) is not equal to 0.

RTREE example:

tarantool> box.schema.space.create('rtree_example')
tarantool> box.space.rtree_example:create_index('primary')
tarantool> box.space.rtree_example:create_index('rtree',{unique=false,type='RTREE', parts={2,'ARRAY'}})
tarantool> box.space.rtree_example:insert{1, {3, 5, 9, 10}}
tarantool> box.space.rtree_example:insert{2, {10, 11}}
tarantool> box.space.rtree_example.index.rtree:select({4, 7, 5, 9}, {iterator = 'GT'})

The result will be:

---
- - [1, [3, 5, 9, 10]]
...

because a rectangle whose corners are at coordinates 4,7,5,9 is entirely within a rectangle whose corners are at coordinates 3,5,9,10.

Additionally, there exist index iterator operations. They can only be used with code in Lua and C/C++. Index iterators are for traversing indexes one key at a time, taking advantage of features that are specific to an index type, for example evaluating Boolean expressions when traversing BITSET indexes, or going in descending order when traversing TREE indexes.

See also other index operations like alter() (modify index) and drop() (delete index) in reference for box.index submodule.

Complexity factors

In reference for box.space and box.index submodules, there are notes about which complexity factors might affect the resource usage of each function.

Complexity factor Effect
Index size The number of index keys is the same as the number of tuples in the data set. For a TREE index, if there are more keys, then the lookup time will be greater, although of course the effect is not linear. For a HASH index, if there are more keys, then there is more RAM used, but the number of low-level steps tends to remain constant.
Index type Typically, a HASH index is faster than a TREE index if the number of tuples in the space is greater than one.
Number of indexes accessed

Ordinarily, only one index is accessed to retrieve one tuple. But to update the tuple, there must be N accesses if the space has N different indexes.

Note re storage engine: Vinyl optimizes away such accesses if secondary index fields are unchanged by the update. So, this complexity factor applies only to memtx, since it always makes a full-tuple copy on every update.

Number of tuples accessed A few requests, for example SELECT, can retrieve multiple tuples. This factor is usually less important than the others.
WAL settings The important setting for the write-ahead log is wal_mode. If the setting causes no writing or delayed writing, this factor is unimportant. If the setting causes every data-change request to wait for writing to finish on a slow device, this factor is more important than all the others.

Transaction control

Transactions in Tarantool occur in fibers on a single thread. That is why Tarantool has a guarantee of execution atomicity. That requires emphasis.

Threads, fibers and yields

How does Tarantool process a basic operation? As an example, let’s take this query:

tarantool> box.space.tester:update({3}, {{'=', 2, 'size'}, {'=', 3, 0}})

This is equivalent to the following SQL statement for a table that stores primary keys in field[1]:

UPDATE tester SET "field[2]" = 'size', "field[3]" = 0 WHERE "field[1]" = 3

This query will be processed with three operating system threads:

  1. If we issue the query on a remote client, then the network thread on the server side receives the query, parses the statement and changes it to a server executable message which has already been checked, and which the server instance can understand without parsing everything again.

  2. The network thread ships this message to the instance’s transaction processor thread using a lock-free message bus. Lua programs execute directly in the transaction processor thread, and do not require parsing and preparation.

    The instance’s transaction processor thread uses the primary-key index on field[1] to find the location of the tuple. It determines that the tuple can be updated (not much can go wrong when you’re merely changing an unindexed field value to something shorter).

  3. The transaction processor thread sends a message to the write-ahead logging (WAL) thread to commit the transaction. When done, the WAL thread replies with a COMMIT or ROLLBACK result, which is returned to the client.

Notice that there is only one transaction processor thread in Tarantool. Some people are used to the idea that there can be multiple threads operating on the database, with (say) thread #1 reading row #x, while thread #2 writes row #y. With Tarantool, no such thing ever happens. Only the transaction processor thread can access the database, and there is only one transaction processor thread for each Tarantool instance.

Like any other Tarantool thread, the transaction processor thread can handle many fibers. A fiber is a set of computer instructions that may contain “yield” signals. The transaction processor thread will execute all computer instructions until a yield, then switch to execute the instructions of a different fiber. Thus (say) the thread reads row #x for the sake of fiber #1, then writes row #y for the sake of fiber #2.

Yields must happen, otherwise the transaction processor thread would stick permanently on the same fiber. There are two types of yields:

  • implicit yields: every data-change operation or network-access causes an implicit yield, and every statement that goes through the Tarantool client causes an implicit yield.
  • explicit yields: in a Lua function, you can (and should) add “yield” statements to prevent hogging. This is called cooperative multitasking.

Cooperative multitasking

Cooperative multitasking means: unless a running fiber deliberately yields control, it is not preempted by some other fiber. But a running fiber will deliberately yield when it encounters a “yield point”: a transaction commit, an operating system call, or an explicit “yield” request. Any system call which can block will be performed asynchronously, and any running fiber which must wait for a system call will be preempted, so that another ready-to-run fiber takes its place and becomes the new running fiber.

This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there will be no concurrency around a resource, no race conditions, and no memory consistency issues.

When requests are small, for example simple UPDATE or INSERT or DELETE or SELECT, fiber scheduling is fair: it takes only a little time to process the request, schedule a disk write, and yield to a fiber serving the next client.

However, a function might perform complex computations or might be written in such a way that yields do not occur for a long time. This can lead to unfair scheduling, when a single client throttles the rest of the system, or to apparent stalls in request processing. Avoiding this situation is the responsibility of the function’s author.

Transactions

In the absence of transactions, any function that contains yield points may see changes in the database state caused by fibers that preempt. Multi-statement transactions exist to provide isolation: each transaction sees a consistent database state and commits all its changes atomically. At commit time, a yield happens and all transaction changes are written to the write ahead log in a single batch. Or, if needed, transaction changes can be rolled back – completely or to a specific savepoint.

To implement isolation, Tarantool uses a simple optimistic scheduler: the first transaction to commit wins. If a concurrent active transaction has read a value modified by a committed transaction, it is aborted.

The cooperative scheduler ensures that, in absence of yields, a multi-statement transaction is not preempted and hence is never aborted. Therefore, understanding yields is essential to writing abort-free code.

Sometimes while testing the transaction mechanism in Tarantool you can notice that yielding after box.begin() but before any read/write operation does not cause an abort as it should according to the description. This happens because actually box.begin() does not start a transaction. It is a mark telling Tarantool to start a transaction after some database request that follows.

Note

You can’t mix storage engines in a transaction today.

Implicit yields

The only explicit yield requests in Tarantool are fiber.sleep() and fiber.yield(), but many other requests “imply” yields because Tarantool is designed to avoid blocking.

Database requests imply yields if and only if there is disk I/O. For memtx, since all data is in memory, there is no disk I/O during the request. For vinyl, since some data may not be in memory, there may be disk I/O for a read (to fetch data from disk) or for a write (because a stall may occur while waiting for memory to be free). For both memtx and vinyl, since data-change requests must be recorded in the WAL, there is normally a commit. A commit happens automatically after every request in default “autocommit” mode, or a commit happens at the end of a transaction in “transaction” mode, when a user deliberately commits by calling box.commit(). Therefore for both memtx and vinyl, because there can be disk I/O, some database operations may imply yields.

Many functions in modules fio, net_box, console and socket (the “os” and “network” requests) yield.

That is why executing separate commands such as select(), insert(), update() in the console inside a transaction will cause an abort. This is due to implicit yield happening after each chunk of code is executed in the console.

Example #1

  • Engine = memtx
    select() insert() has one yield, at the end of insertion, caused by implicit commit; select() has nothing to write to the WAL and so does not yield.
  • Engine = vinyl
    select() insert() has between one and three yields, since select() may yield if the data is not in cache, insert() may yield waiting for available memory, and there is an implicit yield at commit.
  • The sequence begin() insert() insert() commit() yields only at commit if the engine is memtx, and can yield up to 3 times if the engine is vinyl.

Example #2

Assume that in space ‘tester’ there are tuples in which the third field represents a positive dollar amount. Let’s start a transaction, withdraw from tuple#1, deposit in tuple#2, and end the transaction, making its effects permanent.

tarantool> function txn_example(from, to, amount_of_money)
         >   box.begin()
         >   box.space.tester:update(from, {{'-', 3, amount_of_money}})
         >   box.space.tester:update(to,   {{'+', 3, amount_of_money}})
         >   box.commit()
         >   return "ok"
         > end
---
...
tarantool> txn_example({999}, {1000}, 1.00)
---
- "ok"
...

If wal_mode = ‘none’, then implicit yielding at commit time does not take place, because there are no writes to the WAL.

If a task is interactive – sending requests to the server and receiving responses – then it involves network IO, and therefore there is an implicit yield, even if the request that is sent to the server is not itself an implicit yield request. Therefore, the sequence:

select
select
select

causes blocking (in memtx), if it is inside a function or Lua program being executed on the server instance, but causes yielding (in both memtx and vinyl) if it is done as a series of transmissions from a client, including a client which operates via telnet, via one of the connectors, or via the MySQL and PostgreSQL rocks, or via the interactive mode when using Tarantool as a client.

After a fiber has yielded and then has regained control, it immediately issues testcancel.

Access control

Understanding security details is primarily an issue for administrators. However, ordinary users should at least skim this section to get an idea of how Tarantool makes it possible for administrators to prevent unauthorized access to the database and to certain functions.

Briefly:

  • There is a method to guarantee with password checks that users really are who they say they are (“authentication”).
  • There is a _user system space, where usernames and password-hashes are stored.
  • There are functions for saying that certain users are allowed to do certain things (“privileges”).
  • There is a _priv system space, where privileges are stored. Whenever a user tries to do an operation, there is a check whether the user has the privilege to do the operation (“access control”).

Details follow.

Users

There is a current user for any program working with Tarantool, local or remote. If a remote connection is using a binary port, the current user, by default, is ‘guest’. If the connection is using an admin-console port, the current user is ‘admin’. When executing a Lua initialization script, the current user is also ‘admin’.

The current user name can be found with box.session.user().

The current user can be changed:

  • For a binary port connection – with the AUTH protocol command, supported by most clients;
  • For an admin-console connection and in a Lua initialization script – with box.session.su;
  • For a binary-port connection invoking a stored function with the CALL command – if the SETUID property is enabled for the function, Tarantool temporarily replaces the current user with the function’s creator, with all the creator’s privileges, during function execution.

Passwords

Each user (except ‘guest’) may have a password. The password is any alphanumeric string.

Tarantool passwords are stored in the _user system space with a cryptographic hash function so that, if the password is ‘x’, the stored hash-password is a long string like ‘lL3OvhkIPOKh+Vn9Avlkx69M/Ck=‘. When a client connects to a Tarantool instance, the instance sends a random salt value which the client must mix with the hashed-password before sending to the instance. Thus the original value ‘x’ is never stored anywhere except in the user’s head, and the hashed value is never passed down a network wire except when mixed with a random salt.

Note

For more details of the password hashing algorithm (e.g. for the purpose of writing a new client application), read the scramble.h header file.

This system prevents malicious onlookers from finding passwords by snooping in the log files or snooping on the wire. It is the same system that MySQL introduced several years ago, which has proved adequate for medium-security installations. Nevertheless, administrators should warn users that no system is foolproof against determined long-term attacks, so passwords should be guarded and changed occasionally. Administrators should also advise users to choose long unobvious passwords, but it is ultimately up to the users to choose or change their own passwords.

There are two functions for managing passwords in Tarantool: box.schema.user.passwd() for changing a user’s password and box.schema.user.password() for getting a hash of a user’s password.

Owners and privileges

Tarantool has one database. It may be called “box.schema” or “universe”. The database contains database objects, including spaces, indexes, users, roles, sequences, and functions.

The owner of a database object is the user who created it. The owner of the database itself, and the owner of objects that are created initially (the system spaces and the default users) is ‘admin’.

Owners automatically have privileges for what they create. They can share these privileges with other users or with roles, using box.schema.user.grant requests. The following privileges can be granted:

  • ‘read’, e.g. allow select from a space
  • ‘write’, e.g. allow update on a space
  • ‘execute’, e.g. allow call of a function, or (less commonly) allow use of a role
  • ‘create’, e.g. allow box.schema.space.create (access to certain system spaces is also necessary)
  • ‘alter’, e.g. allow box.space.x.index.y:alter (access to certain system spaces is also necessary)
  • ‘drop’, e.g. allow box.sequence.x:drop (currently this can be granted but has no effect)
  • ‘usage’, e.g. whether any action is allowable regardless of other privileges (sometimes revoking ‘usage’ is a convenient way to block a user temporarily without dropping the user)
  • ‘session’, e.g. whether the user can ‘connect’.

To create objects, users need the ‘create’ privilege and at least ‘read’ and ‘write’ privileges on the system space with a similar name (for example, on the _space if the user needs to create spaces).

To access objects, users need an appropriate privilege on the object (for example, the ‘execute’ privilege on function F if the users need to execute function F). See below some examples for granting specific privileges that a grantor – that is, ‘admin’ or the object creator – can make.

To drop an object, users must be the object’s creator or be ‘admin’. As the owner of the entire database, ‘admin’ can drop any object including other users.

To grant privileges to a user, the object owner says grant(). To revoke privileges from a user, the object owner says revoke(). In either case, there are up to five parameters:

(user-name, privilege, object-type [, object-name [, options]])
  • user-name is the user (or role) that will receive or lose the privilege;

  • privilege is any of ‘read’, ‘write’, ‘execute’, ‘create’, ‘alter’, ‘drop’, ‘usage’, or ‘session’ (or a comma-separated list);

  • object-type is any of ‘space’, ‘index’, ‘sequence’, ‘function’, role-name, or ‘universe’;

  • object-name is what the privilege is for (omitted if object-type is ‘universe’);

  • options is a list inside braces for example {if_not_exists=true|false} (usually omitted because the default is acceptable).

    Every update of user privileges is reflected immediately in the existing sessions and objects, e.g. functions.

Example for granting many privileges at once

In this example user ‘admin’ grants many privileges on many objects to user ‘U’, with a single request.

box.schema.user.grant('U','read,write,execute,create,drop','universe')

Examples for granting privileges for specific operations

In these examples the object’s creator grants precisely the minimal privileges necessary for particular operations, to user ‘U’.

-- So that 'U' can create spaces:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','write', 'space', '_schema')
  box.schema.user.grant('U','write', 'space', '_space')
-- So that 'U' can  create indexes (assuming 'U' created the space)
  box.schema.user.grant('U','read', 'space', '_space')
  box.schema.user.grant('U','read,write', 'space', '_index')
-- So that 'U' can  create indexes on space T (assuming 'U' did not create space T)
  box.schema.user.grant('U','create','space','T')
  box.schema.user.grant('U','read', 'space', '_space')
  box.schema.user.grant('U','write', 'space', '_index')
-- So that 'U' can  alter indexes on space T (assuming 'U' did not create the index)
  box.schema.user.grant('U','alter','space','T')
  box.schema.user.grant('U','read','space','_space')
  box.schema.user.grant('U','read','space','_index')
  box.schema.user.grant('U','read','space','_space_sequence')
  box.schema.user.grant('U','write','space','_index')
-- So that 'U' can create users or roles:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','read,write', 'space', '_user')
  box.schema.user.grant('U','write','space', '_priv')
-- So that 'U' can create sequences:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','read,write','space','_sequence')
-- So that 'U' can create functions:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','read,write','space','_func')
-- So that 'U' can grant access on objects that 'U' created
  box.schema.user.grant('U','read','space','_user')
-- So that 'U' can select or get from a space named 'T'
  box.schema.user.grant('U','read','space','T')
-- So that 'U' can update or insert or delete or truncate a space named 'T'
  box.schema.user.grant('U','write','space','T')
-- So that 'U' can execute a function named 'F'
  box.schema.user.grant('U','execute','function','F')
-- So that 'U' can use the "S:next()" function with a sequence named S
  box.schema.user.grant('U','read,write','sequence','S')
-- So that 'U' can use the "S:set()" or "S:reset() function with a sequence named S
  box.schema.user.grant('U','write','sequence','S')

Example for creating users and objects then granting privileges

Here we create a Lua function that will be executed under the user id of its creator, even if called by another user.

First, we create two spaces (‘u’ and ‘i’) and grant a no-password user (‘internal’) full access to them. Then we define a function (‘read_and_modify’) and the no-password user becomes this function’s creator. Finally, we grant another user (‘public_user’) access to execute Lua functions created by the no-password user.

box.schema.space.create('u')
box.schema.space.create('i')
box.space.u:create_index('pk')
box.space.i:create_index('pk')

box.schema.user.create('internal')

box.schema.user.grant('internal', 'read,write', 'space', 'u')
box.schema.user.grant('internal', 'read,write', 'space', 'i')
box.schema.user.grant('internal', 'create', 'universe')
box.schema.user.grant('internal', 'read,write', 'space', '_func')

function read_and_modify(key)
  local u = box.space.u
  local i = box.space.i
  local fiber = require('fiber')
  local t = u:get{key}
  if t ~= nil then
    u:put{key, box.session.uid()}
    i:put{key, fiber.time()}
  end
end

box.session.su('internal')
box.schema.func.create('read_and_modify', {setuid= true})
box.session.su('admin')
box.schema.user.create('public_user', {password = 'secret'})
box.schema.user.grant('public_user', 'execute', 'function', 'read_and_modify')

Roles

A role is a container for privileges which can be granted to regular users. Instead of granting or revoking individual privileges, you can put all the privileges in a role and then grant or revoke the role.

Role information is stored in the _user space, but the third field in the tuple – the type field – is ‘role’ rather than ‘user’.

An important feature in role management is that roles can be nested. For example, role R1 can be granted a privilege “role R2”, so users with the role R1 will subsequently get all privileges from both roles R1 and R2. In other words, a user gets all the privileges that are granted to a user’s roles, directly or indirectly.

There are actually two ways to grant or revoke a role: box.schema.user.grant-or-revoke(user-name-or-role-name,'execute', 'role',role-name...) or box.schema.user.grant-or-revoke(user-name-or-role-name,role-name...). The second way is preferable.

The ‘usage’ and ‘session’ privileges cannot be granted to roles.

Example

-- This example will work for a user with many privileges, such as 'admin'
-- or a user with the pre-defined 'super' role
-- Create space T with a primary index
box.schema.space.create('T')
box.space.T:create_index('primary', {})
-- Create user U1 so that later we can change the current user to U1
box.schema.user.create('U1')
-- Create two roles, R1 and R2
box.schema.role.create('R1')
box.schema.role.create('R2')
-- Grant role R2 to role R1 and role R1 to user U1 (order doesn't matter)
-- There are two ways to grant a role; here we use the shorter way
box.schema.role.grant('R1', 'R2')
box.schema.user.grant('U1', 'R1')
-- Grant read/write privileges for space T to role R2
-- (but not to role R1 and not to user U1)
box.schema.role.grant('R2', 'read,write', 'space', 'T')
-- Change the current user to user U1
box.session.su('U1')
-- An insertion to space T will now succeed because, due to nested roles,
-- user U1 has write privilege on space T
box.space.T:insert{1}

For more detail see box.schema.user.grant() and box.schema.role.grant() in the built-in modules reference.

Sessions and security

A session is the state of a connection to Tarantool. It contains:

  • an integer id identifying the connection,
  • the current user associated with the connection,
  • text description of the connected peer, and
  • session local state, such as Lua variables and functions.

In Tarantool, a single session can execute multiple concurrent transactions. Each transaction is identified by a unique integer id, which can be queried at start of the transaction using box.session.sync().

Note

To track all connects and disconnects, you can use connection and authentication triggers.

Triggers

Triggers, also known as callbacks, are functions which the server executes when certain events happen.

There are six types of triggers in Tarantool:

All triggers have the following characteristics:

  • Triggers associate a function with an event. The request to “define a trigger” implies passing the trigger’s function to one of the “on_event()” functions:
  • Triggers are defined only by the ‘admin’ user.
  • Triggers are stored in the Tarantool instance’s memory, not in the database. Therefore triggers disappear when the instance is shut down. To make them permanent, put function definitions and trigger settings into Tarantool’s initialization script.
  • Triggers have low overhead. If a trigger is not defined, then the overhead is minimal: merely a pointer dereference and check. If a trigger is defined, then its overhead is equivalent to the overhead of calling a function.
  • There can be multiple triggers for one event. In this case, triggers are executed in the reverse order that they were defined in. (Exception: member triggers are executed in the order that they appear in the member list.)
  • Triggers must work within the event context. However, effects are undefined if a function contains requests which normally could not occur immediately after the event, but only before the return from the event. For example, putting os.exit() or box.rollback() in a trigger function would be bringing in requests outside the event context.
  • Triggers are replaceable. The request to “redefine a trigger” implies passing a new trigger function and an old trigger function to one of the “on_event()” functions.
  • The “on_event()” functions all have parameters which are function pointers, and they all return function pointers. Remember that a Lua function definition such as “function f() x = x + 1 end” is the same as “f = function () x = x + 1 end” – in both cases f gets a function pointer. And “trigger = box.session.on_connect(f)” is the same as “trigger = box.session.on_connect(function () x = x + 1 end)” – in both cases trigger gets the function pointer which was passed.
  • You can call any “on_event()” function with no arguments to get a list of its triggers. For example, use box.session.on_connect() to return a table of all connect-trigger functions.
  • Triggers can be useful in solving problems with replication. See details in

Resolving replication conflicts

Example

Here we log connect and disconnect events into Tarantool server log.

log = require('log')

function on_connect_impl()
  log.info("connected "..box.session.peer()..", sid "..box.session.id())
end

function on_disconnect_impl()
  log.info("disconnected, sid "..box.session.id())
end

function on_auth_impl(user)
  log.info("authenticated sid "..box.session.id().." as "..user)
end"

function on_connect() pcall(on_connect_impl) end
function on_disconnect() pcall(on_disconnect_impl) end
function on_auth(user) pcall(on_auth_impl, user) end

box.session.on_connect(on_connect)
box.session.on_disconnect(on_disconnect)
box.session.on_auth(on_auth)

Limitations

Number of parts in an index

For TREE or HASH indexes, the maximum is 255 (box.schema.INDEX_PART_MAX). For RTREE indexes, the maximum is 1 but the field is an ARRAY of up to 20 dimensions. For BITSET indexes, the maximum is 1.

Number of indexes in a space

128 (box.schema.INDEX_MAX).

Number of fields in a tuple

The theoretical maximum is 2,147,483,647 (box.schema.FIELD_MAX). The practical maximum is whatever is specified by the space’s field_count member, or the maximal tuple length.

Number of bytes in a tuple

The maximal number of bytes in a tuple is roughly equal to memtx_max_tuple_size or vinyl_max_tuple_size (with a metadata overhead of about 20 bytes per tuple, which is added on top of useful bytes). By default, the value of either memtx_max_tuple_size or vinyl_max_tuple_size is 1,048,576. To increase it, specify a larger value when starting the Tarantool instance. For example, box.cfg{memtx_max_tuple_size=2*1048576}.

Number of bytes in an index key

If a field in a tuple can contain a million bytes, then the index key can contain a million bytes, so the maximum is determined by factors such as Number of bytes in a tuple, not by the index support.

Number of spaces

The theoretical maximum is 2147483647 (box.schema.SPACE_MAX) but the practical maximum is around 65,000.

Number of connections

The practical limit is the number of file descriptors that one can set with the operating system.

Space size

The total maximum size for all spaces is in effect set by memtx_memory, which in turn is limited by the total available memory.

Update operations count

The maximum number of operations per tuple that can be in a single update is 4000 (BOX_UPDATE_OP_CNT_MAX).

Number of users and roles

32 (BOX_USER_MAX).

Length of an index name or space name or user name

65000 (box.schema.NAME_MAX).

Number of replicas in a replica set

32 (vclock.VCLOCK_MAX).

Storage engines

A storage engine is a set of very-low-level routines which actually store and retrieve tuple values. Tarantool offers a choice of two storage engines:

  • memtx (the in-memory storage engine) is the default and was the first to arrive.

  • vinyl (the on-disk storage engine) is a working key-value engine and will especially appeal to users who like to see data go directly to disk, so that recovery time might be shorter and database size might be larger.

    On the other hand, vinyl lacks some functions and options that are available with memtx. Where that is the case, the relevant description in this manual contains a note beginning with the words “Note re storage engine”.

Further in this section we discuss the details of storing data using the vinyl storage engine.

To specify that the engine should be vinyl, add the clause engine = 'vinyl' when creating a space, for example:

space = box.schema.space.create('name', {engine='vinyl'})

Differences between memtx and vinyl storage engines

The primary difference between memtx and vinyl is that memtx is an “in-memory” engine while vinyl is an “on-disk” engine. An in-memory storage engine is generally faster (each query is usually run under 1 ms), and the memtx engine is justifiably the default for Tarantool, but on-disk engine such as vinyl is preferable when the database is larger than the available memory and adding more memory is not a realistic option.

Option memtx vinyl
Supported index type TREE, HASH, RTREE or BITSET TREE
Temporary spaces Supported Not supported
random() function Supported Not supported
alter() function Supported Supported starting from the 1.10.2 release (the primary index cannot be modified)
len() function Returns the number of tuples in the space Returns the maximum approximate number of tuples in the space
count() function Takes a constant amount of time Takes a variable amount of time depending on a state of a DB
delete() function Returns the deleted tuple, if any Always returns nil
yield Does not yield on the select requests unless the transaction is committed to WAL Yields on the select requests or on its equivalents: get() or pairs()

Storing data with vinyl

Tarantool is a transactional and persistent DBMS that maintains 100% of its data in RAM. The greatest advantages of in-memory databases are their speed and ease of use: they demonstrate consistently high performance, but you never need to tune them.

A few years ago we decided to extend the product by implementing a classical storage engine similar to those used by regular DBMSs: it uses RAM for caching, while the bulk of its data is stored on disk. We decided to make it possible to set a storage engine independently for each table in the database, which is the same way that MySQL approaches it, but we also wanted to support transactions from the very beginning.

The first question we needed to answer was whether to create our own storage engine or use an existing library. The open-source community offered a few viable solutions. The RocksDB library was the fastest growing open-source library and is currently one of the most prominent out there. There were also several lesser-known libraries to consider, such as WiredTiger, ForestDB, NestDB, and LMDB.

Nevertheless, after studying the source code of existing libraries and considering the pros and cons, we opted for our own storage engine. One reason is that the existing third-party libraries expected requests to come from multiple operating system threads and thus contained complex synchronization primitives for controlling parallel data access. If we had decided to embed one of these in Tarantool, we would have made our users bear the overhead of a multithreaded application without getting anything in return. The thing is, Tarantool has an actor-based architecture. The way it processes transactions in a dedicated thread allows it to do away with the unnecessary locks, interprocess communication, and other overhead that accounts for up to 80% of processor time in multithreaded DBMSs.

_images/actor_threads.png

The Tarantool process consists of a fixed number of “actor” threads

If you design a database engine with cooperative multitasking in mind right from the start, it not only significantly speeds up the development process, but also allows the implementation of certain optimization tricks that would be too complex for multithreaded engines. In short, using a third-party solution wouldn’t have yielded the best result.

Algorithm

Once the idea of using an existing library was off the table, we needed to pick an architecture to build upon. There are two competing approaches to on-disk data storage: the older one relies on B-trees and their variations; the newer one advocates the use of log-structured merge-trees, or “LSM” trees. MySQL, PostgreSQL, and Oracle use B-trees, while Cassandra, MongoDB, and CockroachDB have adopted LSM trees.

B-trees are considered better suited for reads and LSM trees—for writes. However, with SSDs becoming more widespread and the fact that SSDs have read throughput that’s several times greater than write throughput, the advantages of LSM trees in most scenarios was more obvious to us.

Before dissecting LSM trees in Tarantool, let’s take a look at how they work. To do that, we’ll begin by analyzing a regular B-tree and the issues it faces. A B-tree is a balanced tree made up of blocks, which contain sorted lists of key- value pairs. (Topics such as filling and balancing a B-tree or splitting and merging blocks are outside of the scope of this article and can easily be found on Wikipedia). As a result, we get a container sorted by key, where the smallest element is stored in the leftmost node and the largest one in the rightmost node. Let’s have a look at how insertions and searches in a B-tree happen.

_images/classical_b_tree.png

Classical B-tree

If you need to find an element or check its membership, the search starts at the root, as usual. If the key is found in the root block, the search stops; otherwise, the search visits the rightmost block holding the largest element that’s not larger than the key being searched (recall that elements at each level are sorted). If the first level yields no results, the search proceeds to the next level. Finally, the search ends up in one of the leaves and probably locates the needed key. Blocks are stored and read into RAM one by one, meaning the algorithm reads logB(N) blocks in a single search, where N is the number of elements in the B-tree. In the simplest case, writes are done similarly: the algorithm finds the block that holds the necessary element and updates (inserts) its value.

To better understand the data structure, let’s consider a practical example: say we have a B-tree with 100,000,000 nodes, a block size of 4096 bytes, and an element size of 100 bytes. Thus each block will hold up to 40 elements (all overhead considered), and the B-tree will consist of around 2,570,000 blocks and 5 levels: the first four will have a size of 256 Mb, while the last one will grow up to 10 Gb. Obviously, any modern computer will be able to store all of the levels except the last one in filesystem cache, so read requests will require just a single I/O operation.

But if we change our perspective —B-trees don’t look so good anymore. Suppose we need to update a single element. Since working with B-trees involves reading and writing whole blocks, we would have to read in one whole block, change our 100 bytes out of 4096, and then write the whole updated block to disk. In other words,we were forced to write 40 times more data than we actually modified!

If you take into account the fact that an SSD block has a size of 64 Kb+ and not every modification changes a whole element, the extra disk workload can be greater still.

Authors of specialized literature and blogs dedicated to on-disk data storage have coined two terms for these phenomena: extra reads are referred to as “read amplification” and writes as “write amplification”.

The amplification factor (multiplication coefficient) is calculated as the ratio of the size of actual read (or written) data to the size of data needed (or actually changed). In our B-tree example, the amplification factor would be around 40 for both reads and writes.

The huge number of extra I/O operations associated with updating data is one of the main issues addressed by LSM trees. Let’s see how they work.

The key difference between LSM trees and regular B-trees is that LSM trees don’t just store data (keys and values), but also data operations: insertions and deletions.

_images/lsm.png


LSM tree:

  • Stores statements, not values:
    • REPLACE
    • DELETE
    • UPSERT
  • Every statement is marked by LSN Append-only files, garbage is collected after a checkpoint
  • Transactional log of all filesystem changes: vylog

For example, an element corresponding to an insertion operation has, apart from a key and a value, an extra byte with an operation code (“REPLACE” in the image above). An element representing the deletion operation contains a key (since storing a value is unnecessary) and the corresponding operation code—“DELETE”. Also, each LSM tree element has a log sequence number (LSN), which is the value of a monotonically increasing sequence that uniquely identifies each operation. The whole tree is first ordered by key in ascending order, and then, within a single key scope, by LSN in descending order.

_images/lsm_single.png

A single level of an LSM tree

Filling an LSM tree

Unlike a B-tree, which is stored completely on disk and can be partly cached in RAM, when using an LSM tree, memory is explicitly separated from disk right from the start. The issue of volatile memory and data persistence is beyond the scope of the storage algorithm and can be solved in various ways—for example, by logging changes.

The part of an LSM tree that’s stored in RAM is called L0 (level zero). The size of RAM is limited, so L0 is allocated a fixed amount of memory. For example, in Tarantool, the L0 size is controlled by the vinyl_memory parameter. Initially, when an LSM tree is empty, operations are written to L0. Recall that all elements are ordered by key in ascending order, and then within a single key scope, by LSN in descending order, so when a new value associated with a given key gets inserted, it’s easy to locate the older value and delete it. L0 can be structured as any container capable of storing a sorted sequence of elements. For example, in Tarantool, L0 is implemented as a B+*-tree. Lookups and insertions are standard operations for the data structure underlying L0, so I won’t dwell on those.

Sooner or later the number of elements in an LSM tree exceeds the L0 size and that’s when L0 gets written to a file on disk (called a “run”) and then cleared for storing new elements. This operation is called a “dump”.

_images/dumps.png


Dumps on disk form a sequence ordered by LSN: LSN ranges in different runs don’t overlap, and the leftmost runs (at the head of the sequence) hold newer operations. Think of these runs as a pyramid, with the newest ones closer to the top. As runs keep getting dumped, the pyramid grows higher. Note that newer runs may contain deletions or replacements for existing keys. To remove older data, it’s necessary to perform garbage collection (this process is sometimes called “merge” or “compaction”) by combining several older runs into a new one. If two versions of the same key are encountered during a compaction, only the newer one is retained; however, if a key insertion is followed by a deletion, then both operations can be discarded.

_images/purge.png


The key choices determining an LSM tree’s efficiency are which runs to compact and when to compact them. Suppose an LSM tree stores a monotonically increasing sequence of keys (1, 2, 3, …,) with no deletions. In this case, compacting runs would be useless: all of the elements are sorted, the tree doesn’t have any garbage, and the location of any key can unequivocally be determined. On the other hand, if an LSM tree contains many deletions, doing a compaction would free up some disk space. However, even if there are no deletions, but key ranges in different runs overlap a lot, compacting such runs could speed up lookups as there would be fewer runs to scan. In this case, it might make sense to compact runs after each dump. But keep in mind that a compaction causes all data stored on disk to be overwritten, so with few reads it’s recommended to perform it less often.

To ensure it’s optimally configurable for any of the scenarios above, an LSM tree organizes all runs into a pyramid: the newer the data operations, the higher up the pyramid they are located. During a compaction, the algorithm picks two or more neighboring runs of approximately equal size, if possible.

_images/compaction.png


  • Multi-level compaction can span any number of levels
  • A level can contain multiple runs

All of the neighboring runs of approximately equal size constitute an LSM tree level on disk. The ratio of run sizes at different levels determines the pyramid’s proportions, which allows optimizing the tree for write-intensive or read-intensive scenarios.

Suppose the L0 size is 100 Mb, the ratio of run sizes at each level (the vinyl_run_size_ratio parameter) is 5, and there can be no more than 2 runs per level (the vinyl_run_count_per_level parameter). After the first 3 dumps, the disk will contain 3 runs of 100 Mb each—which constitute L1 (level one). Since 3 > 2, the runs will be compacted into a single 300 Mb run, with the older ones being deleted. After 2 more dumps, there will be another compaction, this time of 2 runs of 100 Mb each and the 300 Mb run, which will produce one 500 Mb run. It will be moved to L2 (recall that the run size ratio is 5), leaving L1 empty. The next 10 dumps will result in L2 having 3 runs of 500 Mb each, which will be compacted into a single 1500 Mb run. Over the course of 10 more dumps, the following will happen: 3 runs of 100 Mb each will be compacted twice, as will two 100 Mb runs and one 300 Mb run, which will yield 2 new 500 Mb runs in L2. Since L2 now has 3 runs, they will also be compacted: two 500 Mb runs and one 1500 Mb run will produce a 2500 Mb run that will be moved to L3, given its size.

This can go on infinitely, but if an LSM tree contains lots of deletions, the resulting compacted run can be moved not only down, but also up the pyramid due to its size being smaller than the sizes of the original runs that were compacted. In other words, it’s enough to logically track which level a certain run belongs to, based on the run size and the smallest and greatest LSN among all of its operations.

Controlling the form of an LSM tree

If it’s necessary to reduce the number of runs for lookups, then the run size ratio can be increased, thus bringing the number of levels down. If, on the other hand, you need to minimize the compaction-related overhead, then the run size ratio can be decreased: the pyramid will grow higher, and even though runs will be compacted more often, they will be smaller, which will reduce the total amount of work done. In general, write amplification in an LSM tree is described by this formula: log_{x}(\frac {N} {L0}) × x or, alternatively, x × \frac {ln (\frac {N} {C0})} {ln(x)}, where N is the total size of all tree elements, L0 is the level zero size, and x is the level size ratio (the level_size_ratio parameter). At \frac {N} {C0} = 40 (the disk-to- memory ratio), the plot would look something like this:

_images/curve.png


As for read amplification, it’s proportional to the number of levels. The lookup cost at each level is no greater than that for a B-tree. Getting back to the example of a tree with 100,000,000 elements: given 256 Mb of RAM and the default values of vinyl_run_size_ratio and vinyl_run_count_per_level, write amplification would come out to about 13, while read amplification could be as high as 150. Let’s try to figure out why this happens.

Range searching

In the case of a single-key search, the algorithm stops after encountering the first match. However, when searching within a certain key range (for example, looking for all the users with the last name “Ivanov”), it’s necessary to scan all tree levels.

_images/range_search.png

Searching within a range of [24,30)

The required range is formed the same way as when compacting several runs: the algorithm picks the key with the largest LSN out of all the sources, ignoring the other associated operations, then moves on to the next key and repeats the procedure.

Deletion

Why would one store deletions? And why doesn’t it lead to a tree overflow in the case of for i=1,10000000 put(i) delete(i) end?

With regards to lookups, deletions signal the absence of a value being searched; with compactions, they clear the tree of “garbage” records with older LSNs.

While the data is in RAM only, there’s no need to store deletions. Similarly, you don’t need to keep them following a compaction if they affect, among other things, the lowest tree level, which contains the oldest dump. Indeed, if a value can’t be found at the lowest level, then it doesn’t exist in the tree.

  • We can’t delete from append-only files
  • Tombstones (delete markers) are inserted into L0 instead
_images/deletion_1.png

Deletion, step 1: a tombstone is inserted into L0

_images/deletion_2.png

Deletion, step 2: the tombstone passes through intermediate levels

_images/deletion_3.png

Deletion, step 3: in the case of a major compaction, the tombstone is removed from the tree

If a deletion is known to come right after the insertion of a unique value, which is often the case when modifying a value in a secondary index, then the deletion can safely be filtered out while compacting intermediate tree levels. This optimization is implemented in vinyl.

Advantages of an LSM tree

Apart from decreasing write amplification, the approach that involves periodically dumping level L0 and compacting levels L1-Lk has a few advantages over the approach to writes adopted by B-trees:

  • Dumps and compactions write relatively large files: typically, the L0 size is 50-100 Mb, which is thousands of times larger than the size of a B-tree block.
  • This large size allows efficiently compressing data before writing it. Tarantool compresses data automatically, which further decreases write amplification.
  • There is no fragmentation overhead, since there’s no padding/empty space between the elements inside a run.
  • All operations create new runs instead of modifying older data in place. This allows avoiding those nasty locks that everyone hates so much. Several operations can run in parallel without causing any conflicts. This also simplifies making backups and moving data to replicas.
  • Storing older versions of data allows for the efficient implementation of transaction support by using multiversion concurrency control.
Disadvantages of an LSM tree and how to deal with them

One of the key advantages of the B-tree as a search data structure is its predictability: all operations take no longer than log_{B}(N) to run. Conversely, in a classical LSM tree, both read and write speeds can differ by a factor of hundreds (best case scenario) or even thousands (worst case scenario). For example, adding just one element to L0 can cause it to overflow, which can trigger a chain reaction in levels L1, L2, and so on. Lookups may find the needed element in L0 or may need to scan all of the tree levels. It’s also necessary to optimize reads within a single level to achieve speeds comparable to those of a B-tree. Fortunately, most disadvantages can be mitigated or even eliminated with additional algorithms and data structures. Let’s take a closer look at these disadvantages and how they’re dealt with in Tarantool.

Unpredictable write speed

In an LSM tree, insertions almost always affect L0 only. How do you avoid idle time when the memory area allocated for L0 is full?

Clearing L0 involves two lengthy operations: writing to disk and memory deallocation. To avoid idle time while L0 is being dumped, Tarantool uses writeaheads. Suppose the L0 size is 256 Mb. The disk write speed is 10 Mbps. Then it would take 26 seconds to dump L0. The insertion speed is 10,000 RPS, with each key having a size of 100 bytes. While L0 is being dumped, it’s necessary to reserve 26 Mb of RAM, effectively slicing the L0 size down to 230 Mb.

Tarantool does all of these calculations automatically, constantly updating the rolling average of the DBMS workload and the histogram of the disk speed. This allows using L0 as efficiently as possible and it prevents write requests from timing out. But in the case of workload surges, some wait time is still possible. That’s why we also introduced an insertion timeout (the vinyl_timeout parameter), which is set to 60 seconds by default. The write operation itself is executed in dedicated threads. The number of these threads (2 by default) is controlled by the vinyl_write_threads parameter. The default value of 2 allows doing dumps and compactions in parallel, which is also necessary for ensuring system predictability.

In Tarantool, compactions are always performed independently of dumps, in a separate execution thread. This is made possible by the append-only nature of an LSM tree: after dumps runs are never changed, and compactions simply create new runs.

Delays can also be caused by L0 rotation and the deallocation of memory dumped to disk: during a dump, L0 memory is owned by two operating system threads, a transaction processing thread and a write thread. Even though no elements are being added to the rotated L0, it can still be used for lookups. To avoid read locks when doing lookups, the write thread doesn’t deallocate the dumped memory, instead delegating this task to the transaction processor thread. Following a dump, memory deallocation itself happens instantaneously: to achieve this, L0 uses a special allocator that deallocates all of the memory with a single operation.

_images/dump_from_shadow.png
  • anticipatory dump
  • throttling

The dump is performed from the so-called “shadow” L0 without blocking new insertions and lookups

Unpredictable read speed

Optimizing reads is the most difficult optimization task with regards to LSM trees. The main complexity factor here is the number of levels: any optimization causes not only much slower lookups, but also tends to require significantly larger RAM resources. Fortunately, the append-only nature of LSM trees allows us to address these problems in ways that would be nontrivial for traditional data structures.

_images/read_speed.png
  • page index
  • bloom filters
  • tuple range cache
  • multi-level compaction
Compression and page index

In B-trees, data compression is either the hardest problem to crack or a great marketing tool—rather than something really useful. In LSM trees, compression works as follows:

During a dump or compaction all of the data within a single run is split into pages. The page size (in bytes) is controlled by the vinyl_page_size parameter and can be set separately for each index. A page doesn’t have to be exactly of vinyl_page_size size—depending on the data it holds, it can be a little bit smaller or larger. Because of this, pages never have any empty space inside.

Data is compressed by Facebook’s streaming algorithm called “zstd”. The first key of each page, along with the page offset, is added to a “page index”, which is a separate file that allows the quick retrieval of any page. After a dump or compaction, the page index of the created run is also written to disk.

All .index files are cached in RAM, which allows finding the necessary page with a single lookup in a .run file (in vinyl, this is the extension of files resulting from a dump or compaction). Since data within a page is sorted, after it’s read and decompressed, the needed key can be found using a regular binary search. Decompression and reads are handled by separate threads, and are controlled by the vinyl_read_threads parameter.

Tarantool uses a universal file format: for example, the format of a .run file is no different from that of an .xlog file (log file). This simplifies backup and recovery as well as the usage of external tools.

Bloom filters

Even though using a page index enables scanning fewer pages per run when doing a lookup, it’s still necessary to traverse all of the tree levels. There’s a special case, which involves checking if particular data is absent when scanning all of the tree levels and it’s unavoidable: I’m talking about insertions into a unique index. If the data being inserted already exists, then inserting the same data into a unique index should lead to an error. The only way to throw an error in an LSM tree before a transaction is committed is to do a search before inserting the data. Such reads form a class of their own in the DBMS world and are called “hidden” or “parasitic” reads.

Another operation leading to hidden reads is updating a value in a field on which a secondary index is defined. Secondary keys are regular LSM trees that store differently ordered data. In most cases, in order not to have to store all of the data in all of the indexes, a value associated with a given key is kept in whole only in the primary index (any index that stores both a key and a value is called “covering” or “clustered”), whereas the secondary index only stores the fields on which a secondary index is defined, and the values of the fields that are part of the primary index. Thus, each time a change is made to a value in a field on which a secondary index is defined, it’s necessary to first remove the old key from the secondary index—and only then can the new key be inserted. At update time, the old value is unknown, and it is this value that needs to be read in from the primary key “under the hood”.

For example:

update t1 set city=’Moscow’ where id=1

To minimize the number of disk reads, especially for nonexistent data, nearly all LSM trees use probabilistic data structures, and Tarantool is no exception. A classical Bloom filter is made up of several (usually 3-to-5) bit arrays. When data is written, several hash functions are calculated for each key in order to get corresponding array positions. The bits at these positions are then set to 1. Due to possible hash collisions, some bits might be set to 1 twice. We’re most interested in the bits that remain 0 after all keys have been added. When looking for an element within a run, the same hash functions are applied to produce bit positions in the arrays. If any of the bits at these positions is 0, then the element is definitely not in the run. The probability of a false positive in a Bloom filter is calculated using Bayes’ theorem: each hash function is an independent random variable, so the probability of a collision simultaneously occurring in all of the bit arrays is infinitesimal.

The key advantage of Bloom filters in Tarantool is that they’re easily configurable. The only parameter that can be specified separately for each index is called vinyl_bloom_fpr (FPR stands for “false positive ratio”) and it has the default value of 0.05, which translates to a 5% FPR. Based on this parameter, Tarantool automatically creates Bloom filters of the optimal size for partial- key and full-key searches. The Bloom filters are stored in the .index file, along with the page index, and are cached in RAM.

Caching

A lot of people think that caching is a silver bullet that can help with any performance issue. “When in doubt, add more cache”. In vinyl, caching is viewed rather as a means of reducing the overall workload and consequently, of getting a more stable response time for those requests that don’t hit the cache. vinyl boasts a unique type of cache among transactional systems called a “range tuple cache”. Unlike, say, RocksDB or MySQL, this cache doesn’t store pages, but rather ranges of index values obtained from disk, after having performed a compaction spanning all tree levels. This allows the use of caching for both single-key and key-range searches. Since this method of caching stores only hot data and not, say, pages (you may need only some data from a page), RAM is used in the most efficient way possible. The cache size is controlled by the vinyl_cache parameter.

Garbage collection control

Chances are that by now you’ve started losing focus and need a well-deserved dopamine reward. Feel free to take a break, since working through the rest of the article is going to take some serious mental effort.

An LSM tree in vinyl is just a small piece of the puzzle. Even with a single table (or so-called “space”), vinyl creates and maintains several LSM trees, one for each index. But even a single index can be comprised of dozens of LSM trees. Let’s try to understand why this might be necessary.

Recall our example with a tree containing 100,000,000 records, 100 bytes each. As time passes, the lowest LSM level may end up holding a 10 Gb run. During compaction, a temporary run of approximately the same size will be created. Data at intermediate levels takes up some space as well, since the tree may store several operations associated with a single key. In total, storing 10 Gb of actual data may require up to 30 Gb of free space: 10 Gb for the last tree level, 10 Gb for a temporary run, and 10 Gb for the remaining data. But what if the data size is not 10 Gb, but 1 Tb? Requiring that the available disk space always be several times greater than the actual data size is financially unpractical, not to mention that it may take dozens of hours to create a 1 Tb run. And in the case of an emergency shutdown or system restart, the process would have to be started from scratch.

Here’s another scenario. Suppose the primary key is a monotonically increasing sequence—for example, a time series. In this case, most insertions will fall into the right part of the key range, so it wouldn’t make much sense to do a compaction just to append a few million more records to an already huge run.

But what if writes predominantly occur in a particular region of the key range, whereas most reads take place in a different region? How do you optimize the form of the LSM tree in this case? If it’s too high, read performance is impacted; if it’s too low—write speed is reduced.

Tarantool “factorizes” this problem by creating multiple LSM trees for each index. The approximate size of each subtree may be controlled by the vinyl_range_size configuration parameter. We call such subtrees “ranges”.

_images/factor_lsm.png


Factorizing large LSM trees via ranging

  • Ranges reflect a static layout of sorted runs
  • Slices connect a sorted run into a range

Initially, when the index has few elements, it consists of a single range. As more elements are added, its total size may exceed the maximum range size. In that case a special operation called “split” divides the tree into two equal parts. The tree is split at the middle element in the range of keys stored in the tree. For example, if the tree initially stores the full range of -inf…+inf, then after splitting it at the middle key X, we get two subtrees: one that stores the range of -inf…X, and the other storing the range of X…+inf. With this approach, we always know which subtree to use for writes and which one for reads. If the tree contained deletions and each of the neighboring ranges grew smaller as a result, the opposite operation called “coalesce” combines two neighboring trees into one.

Split and coalesce don’t entail a compaction, the creation of new runs, or other resource-intensive operations. An LSM tree is just a collection of runs. vinyl has a special metadata log that helps keep track of which run belongs to which subtree(s). This has the .vylog extension and its format is compatible with an .xlog file. Similarly to an .xlog file, the metadata log gets rotated at each checkpoint. To avoid the creation of extra runs with split and coalesce, we have also introduced an auxiliary entity called “slice”. It’s a reference to a run containing a key range and it’s stored only in the metadata log. Once the reference counter drops to zero, the corresponding file gets removed. When it’s necessary to perform a split or to coalesce, Tarantool creates slice objects for each new tree, removes older slices, and writes these operations to the metadata log, which literally stores records that look like this: <tree id, slice id> or <slice id, run id, min, max>.

This way all of the heavy lifting associated with splitting a tree into two subtrees is postponed until a compaction and then is performed automatically. A huge advantage of dividing all of the keys into ranges is the ability to independently control the L0 size as well as the dump and compaction processes for each subtree, which makes these processes manageable and predictable. Having a separate metadata log also simplifies the implementation of both “truncate” and “drop”. In vinyl, they’re processed instantly, since they only work with the metadata log, while garbage collection is done in the background.

Advanced features of vinyl
Upsert

In the previous sections, we mentioned only two operations stored by an LSM tree: deletion and replacement. Let’s take a look at how all of the other operations can be represented. An insertion can be represented via a replacement—you just need to make sure there are no other elements with the specified key. To perform an update, it’s necessary to read the older value from the tree, so it’s easier to represent this operation as a replacement as well—this speeds up future read requests by the key. Besides, an update must return the new value, so there’s no avoiding hidden reads.

In B-trees, the cost of hidden reads is negligible: to update a block, it first needs to be read from disk anyway. Creating a special update operation for an LSM tree that doesn’t cause any hidden reads is really tempting.

Such an operation must contain not only a default value to be inserted if a key has no value yet, but also a list of update operations to perform if a value does exist.

At transaction execution time, Tarantool just saves the operation in an LSM tree, then “executes” it later, during a compaction.

The upsert operation:

space:upsert(tuple, {{operator, field, value}, ... })
  • Non-reading update or insert
  • Delayed execution
  • Background upsert squashing prevents upserts from piling up

Unfortunately, postponing the operation execution until a compaction doesn’t leave much leeway in terms of error handling. That’s why Tarantool tries to validate upserts as fully as possible before writing them to an LSM tree. However, some checks are only possible with older data on hand, for example when the update operation is trying to add a number to a string or to remove a field that doesn’t exist.

A semantically similar operation exists in many products including PostgreSQL and MongoDB. But anywhere you look, it’s just syntactic sugar that combines the update and replace operations without avoiding hidden reads. Most probably, the reason is that LSM trees as data storage structures are relatively new.

Even though an upsert is a very important optimization and implementing it cost us a lot of blood, sweat, and tears, we must admit that it has limited applicability. If a table contains secondary keys or triggers, hidden reads can’t be avoided. But if you have a scenario where secondary keys are not required and the update following the transaction completion will certainly not cause any errors, then the operation is for you.

I’d like to tell you a short story about an upsert. It takes place back when vinyl was only beginning to “mature” and we were using an upsert in production for the first time. We had what seemed like an ideal environment for it: we had tons of keys, the current time was being used as values; update operations were inserting keys or modifying the current time; and we had few reads. Load tests yielded great results.

Nevertheless, after a couple of days, the Tarantool process started eating up 100% of our CPU, and the system performance dropped close to zero.

We started digging into the issue and found out that the distribution of requests across keys was significantly different from what we had seen in the test environment. It was…well, quite nonuniform. Most keys were updated once or twice a day, so the database was idle for the most part, but there were much hotter keys with tens of thousands of updates per day. Tarantool handled those just fine. But in the case of lookups by key with tens of thousands of upserts, things quickly went downhill. To return the most recent value, Tarantool had to read and “replay” the whole history consisting of all of the upserts. When designing upserts, we had hoped this would happen automatically during a compaction, but the process never even got to that stage: the L0 size was more than enough, so there were no dumps.

We solved the problem by adding a background process that performed readaheads on any keys that had more than a few dozen upserts piled up, so all those upserts were squashed and substituted with the read value.

Secondary keys

Update is not the only operation where optimizing hidden reads is critical. Even the replace operation, given secondary keys, has to read the older value: it needs to be independently deleted from the secondary indexes, and inserting a new element might not do this, leaving some garbage behind.

_images/secondary.png


If secondary indexes are not unique, then collecting “garbage” from them can be put off until a compaction, which is what we do in Tarantool. The append-only nature of LSM trees allowed us to implement full-blown serializable transactions in vinyl. Read-only requests use older versions of data without blocking any writes. The transaction manager itself is fairly simple for now: in classical terms, it implements the MVTO (multiversion timestamp ordering) class, whereby the winning transaction is the one that finished earlier. There are no locks and associated deadlocks. Strange as it may seem, this is a drawback rather than an advantage: with parallel execution, you can increase the number of successful transactions by simply holding some of them on lock when necessary. We’re planning to improve the transaction manager soon. In the current release, we focused on making the algorithm behave 100% correctly and predictably. For example, our transaction manager is one of the few on the NoSQL market that supports so-called “gap locks”.

Tarantool Cartridge

Here we explain how you can benefit with Tarantool Cartridge, a framework for developing, deploying, and managing applications based on Tarantool.

This documentation contains the following sections:

Tarantool Cartridge: a framework for distributed applications development

About Tarantool Cartridge

Tarantool Cartridge allows you to easily develop Tarantool-based applications and run them on one or more Tarantool instances organized into a cluster.

This is the recommended alternative to the old-school practices of application development for Tarantool.

As a software development kit (SDK), Tarantool Cartridge provides you with utilities and an application template to help:

  • easily set up a development environment for your applications;
  • plug the necessary Lua modules.

The resulting package can be installed and started on one or multiple servers as one or multiple instantiated services – independent or organized into a cluster.

A Tarantool cluster is a collection of Tarantool instances acting in concert. While a single Tarantool instance can leverage the performance of a single server and is vulnerable to failure, the cluster spans multiple servers, utilizes their cumulative CPU power, and is fault-tolerant.

To fully utilize the capabilities of a Tarantool cluster, you need to develop applications keeping in mind they are to run in a cluster environment.

As a cluster management tool, Tarantool Cartridge provides your cluster-aware applications with the following key benefits:

  • horizontal scalability and load balancing via built-in automatic sharding;
  • asynchronous replication;
  • automatic failover;
  • centralized cluster control via GUI or API;
  • automatic configuration synchronization;
  • instance functionality segregation.

A Tarantool Cartridge cluster can segregate functionality between instances via built-in and custom (user-defined) cluster roles. You can toggle instances on and off on the fly during cluster operation. This allows you to put different types of workloads (e.g., compute- and transaction-intensive ones) on different physical servers with dedicated hardware.

Tarantool Cartridge has an external utility called cartridge-cli which provides you with utilities and an application template to help:

  • easily set up a development environment for your applications;
  • plug the necessary Lua modules;
  • pack the applications in an environment-independent way: together with module binaries and Tarantool executables.

Getting started

Prerequisites

To get a template application that uses Tarantool Cartridge and run it, you need to install several packages:

  • tarantool and tarantool-dev (see these instructions);
  • cartridge-cli (see these instructions)
  • git, gcc, cmake and make.
Create your first application

Long story short, copy-paste this into the console:

cartridge create --name myapp
cd myapp
cartridge build
cartridge start

That’s all! Now you can visit http://localhost:8081 and see your application’s Admin Web UI:

https://user-images.githubusercontent.com/11336358/75786427-52820c00-5d76-11ea-93a4-309623bda70f.png

Contribution

The workflow for Cartridge contributors may be different from that for Cartridge users as it it implies building the project from source (documentation, Web UI) and running tests.

Building from source

The fastest way to build the project is to skip building the Web UI:

CMAKE_DUMMY_WEBUI=true tarantoolctl rocks make

But if you want to build the frontend too, you’ll also need:

Documentation is generated from source code, but only if the ldoc and sphinx tools are installed:

pip install 'sphinx==3.0.3'
tarantoolctl rocks install \
  https://raw.githubusercontent.com/tarantool/LDoc/tarantool/ldoc-scm-2.rockspec \
  --server=http://rocks.moonscript.org
tarantoolctl rocks make
Running a demo cluster

There are several example entry points which are mostly used for testing, but can also be useful for demo purposes or experiments:

cartridge start

# or select a specific entry point
# cartridge start --script ./test/entrypoint/srv_vshardless.lua

It can be accessed through the Web UI (http://localhost:8081) or via the binary protocol:

tarantoolctl connect admin@localhost:3301

If you also need the stateful failover mode, launch an external state provider – stateboard:

cartridge start --stateboard

And set failover parameters according to instances.yml. The defaults are:

  • State provider URI: localhost:4401;
  • Password: qwerty.

For more details about cartridge-cli, see its usage.

Auto-generated sources

After the GraphQL API is changed, don’t forget to fetch the schema doc/schema.graphql:

npm install graphql-cli@3.0.14
./fetch-schema.sh
Running tests
# Backend
tarantoolctl rocks install luacheck
tarantoolctl rocks install luatest 0.5.0
.rocks/bin/luacheck .
.rocks/bin/luatest -v --exclude cypress

# Frontend
npm install cypress@3.4.1
./frontend-test.sh
.rocks/bin/luatest -v -p cypress

# Collect coverage
tarantoolctl rocks install luacov
tarantoolctl rocks install luacov-console
.rocks/bin/luatest -v --coverage
.rocks/bin/luacov-console `pwd`
.rocks/bin/luacov-console -s

Developer’s guide

For a quick start, skip the details below and jump right away to the Cartridge getting started guide.

For a deep dive into what you can develop with Tarantool Cartridge, go on with the Cartridge developer’s guide.

Introduction

To develop and start an application, in short, you need to go through the following steps:

  1. Install Tarantool Cartridge and other components of the development environment.
  2. Create a project.
  3. Develop the application. In case it is a cluster-aware application, implement its logic in a custom (user-defined) cluster role to initialize the database in a cluster environment.
  4. Deploy the application to target server(s). This includes configuring and starting the instance(s).
  5. In case it is a cluster-aware application, deploy the cluster.

The following sections provide details for each of these steps.

Installing Tarantool Cartridge

  1. Install cartridge-cli, a command-line tool for developing, deploying, and managing Tarantool applications.
  2. Install git, a version control system.
  3. Install npm, a package manager for node.js.
  4. Install the unzip utility.

Creating a project

To set up your development environment, create a project using the Tarantool Cartridge project template. In any directory, say:

$ cartridge create --name <app_name> /path/to/

This will automatically set up a Git repository in a new /path/to/<app_name>/ directory, tag it with version 0.1.0, and put the necessary files into it.

In this Git repository, you can develop the application (by simply editing the default files provided by the template), plug the necessary modules, and then easily pack everything to deploy on your server(s).

The project template creates the <app_name>/ directory with the following contents:

  • <app_name>-scm-1.rockspec file where you can specify the application dependencies.
  • deps.sh script that resolves dependencies from the .rockspec file.
  • init.lua file which is the entry point for your application.
  • .git file necessary for a Git repository.
  • .gitignore file to ignore the unnecessary files.
  • env.lua file that sets common rock paths so that the application can be started from any directory.
  • custom-role.lua file that is a placeholder for a custom (user-defined) cluster role.

The entry point file (init.lua), among other things, loads the cartridge module and calls its initialization function:

...
local cartridge = require('cartridge')
...
cartridge.cfg({
-- cartridge options example
  workdir = '/var/lib/tarantool/app',
  advertise_uri = 'localhost:3301',
  cluster_cookie = 'super-cluster-cookie',
  ...
}, {
-- box options example
  memtx_memory = 1000000000,
  ... })
 ...

The cartridge.cfg() call renders the instance operable via the administrative console but does not call box.cfg() to configure instances.

Warning

Calling the box.cfg() function is forbidden.

The cluster itself will do it for you when it is time to:

  • bootstrap the current instance once you:
    • run cartridge.bootstrap() via the administrative console, or
    • click Create in the web interface;
  • join the instance to an existing cluster once you:
    • run cartridge.join_server({uri = 'other_instance_uri'}) via the console, or
    • click Join (an existing replica set) or Create (a new replica set) in the web interface.

Notice that you can specify a cookie for the cluster (cluster_cookie parameter) if you need to run several clusters in the same network. The cookie can be any string value.

Now you can develop an application that will run on a single or multiple independent Tarantool instances (e.g. acting as a proxy to third-party databases) – or will run in a cluster.

If you plan to develop a cluster-aware application, first familiarize yourself with the notion of cluster roles.

Cluster roles

Cluster roles are Lua modules that implement some specific functions and/or logic. In other words, a Tarantool Cartridge cluster segregates instance functionality in a role-based way.

Since all instances running cluster applications use the same source code and are aware of all the defined roles (and plugged modules), you can dynamically enable and disable multiple different roles without restarts, even during cluster operation.

Note that every instance in a replica set performs the same roles and you cannot enable/disable roles individually on some instances. In other words, configuration of enabled roles is set up per replica set. See a step-by-step configuration example in this guide.

Built-in roles

The cartridge module comes with two built-in roles that implement automatic sharding:

  • vshard-router that handles the vshard’s compute-intensive workload: routes requests to storage nodes.

  • vshard-storage that handles the vshard’s transaction-intensive workload: stores and manages a subset of a dataset.

    Note

    For more information on sharding, see the vshard module documentation.

With the built-in and custom roles, you can develop applications with separated compute and transaction handling – and enable relevant workload-specific roles on different instances running on physical servers with workload-dedicated hardware.

Custom roles

You can implement custom roles for any purposes, for example:

  • define stored procedures;
  • implement extra features on top of vshard;
  • go without vshard at all;
  • implement one or multiple supplementary services such as e-mail notifier, replicator, etc.

To implement a custom cluster role, do the following:

  1. Take the app/roles/custom.lua file in your project as a sample. Rename this file as you wish, e.g. app/roles/custom-role.lua, and implement the role’s logic. For example:

    -- Implement a custom role in app/roles/custom-role.lua
    #!/usr/bin/env tarantool
    local role_name = 'custom-role'
    
    local function init()
    ...
    end
    
    local function stop()
    ...
    end
    
    return {
        role_name = role_name,
        init = init,
        stop = stop,
    }
    

    Here the role_name value may differ from the module name passed to the cartridge.cfg() function. If the role_name variable is not specified, the module name is the default value.

    Note

    Role names must be unique as it is impossible to register multiple roles with the same name.

  2. Register the new role in the cluster by modifying the cartridge.cfg() call in the init.lua entry point file:

    -- Register a custom role in init.lua
    ...
    local cartridge = require('cartridge')
    ...
    cartridge.cfg({
      workdir = ...,
      advertise_uri = ...,
      roles = {'custom-role'},
    })
    ...
    

    where custom-role is the name of the Lua module to be loaded.

The role module does not have required functions, but the cluster may execute the following ones during the role’s life cycle:

  • init() is the role’s initialization function.

    Inside the function’s body you can call any box functions: create spaces, indexes, grant permissions, etc. Here is what the initialization function may look like:

    local function init(opts)
        -- The cluster passes an 'opts' Lua table containing an 'is_master' flag.
        if opts.is_master then
            local customer = box.schema.space.create('customer',
                { if_not_exists = true }
            )
            customer:format({
                {'customer_id', 'unsigned'},
                {'bucket_id', 'unsigned'},
                {'name', 'string'},
            })
            customer:create_index('customer_id', {
                parts = {'customer_id'},
                if_not_exists = true,
            })
        end
    end
    

    Note

    • Neither vshard-router nor vshard-storage manage spaces, indexes, or formats. You should do it within a custom role: add a box.schema.space.create() call to your first cluster role, as shown in the example above.
    • The function’s body is wrapped in a conditional statement that lets you call box functions on masters only. This protects against replication collisions as data propagates to replicas automatically.
  • stop() is the role’s termination function. Implement it if initialization starts a fiber that has to be stopped or does any job that needs to be undone on termination.

  • validate_config() and apply_config() are functions that validate and apply the role’s configuration. Implement them if some configuration data needs to be stored cluster-wide.

Next, get a grip on the role’s life cycle to implement the functions you need.

Defining role dependencies

You can instruct the cluster to apply some other roles if your custom role is enabled.

For example:

-- Role dependencies defined in app/roles/custom-role.lua
local role_name = 'custom-role'
...
return {
    role_name = role_name,
    dependencies = {'cartridge.roles.vshard-router'},
    ...
}

Here vshard-router role will be initialized automatically for every instance with custom-role enabled.

Using multiple vshard storage groups

Replica sets with vshard-storage roles can belong to different groups. For example, hot or cold groups meant to independently process hot and cold data.

Groups are specified in the cluster’s configuration:

-- Specify groups in init.lua
cartridge.cfg({
    vshard_groups = {'hot', 'cold'},
    ...
})

If no groups are specified, the cluster assumes that all replica sets belong to the default group.

With multiple groups enabled, every replica set with a vshard-storage role enabled must be assigned to a particular group. The assignment can never be changed.

Another limitation is that you cannot add groups dynamically (this will become available in future).

Finally, mind the syntax for router access. Every instance with a vshard-router role enabled initializes multiple routers. All of them are accessible through the role:

local router_role = cartridge.service_get('vshard-router')
router_role.get('hot'):call(...)

If you have no roles specified, you can access a static router as before (when Tarantool Cartridge was unaware of groups):

local vhsard = require('vshard')
vshard.router.call(...)

However, when using the current group-aware API, you must call a static router with a colon:

local router_role = cartridge.service_get('vshard-router')
local default_router = router_role.get() -- or router_role.get('default')
default_router:call(...)
Role’s life cycle (and the order of function execution)

The cluster displays the names of all custom roles along with the built-in vshard-* roles in the web interface. Cluster administrators can enable and disable them for particular instances – either via the web interface or via the cluster public API. For example:

cartridge.admin.edit_replicaset('replicaset-uuid', {roles = {'vshard-router', 'custom-role'}})

If you enable multiple roles on an instance at the same time, the cluster first initializes the built-in roles (if any) and then the custom ones (if any) in the order the latter were listed in cartridge.cfg().

If a custom role has dependent roles, the dependencies are registered and validated first, prior to the role itself.

The cluster calls the role’s functions in the following circumstances:

  • The init() function, typically, once: either when the role is enabled by the administrator or at the instance restart. Enabling a role once is normally enough.
  • The stop() function – only when the administrator disables the role, not on instance termination.
  • The validate_config() function, first, before the automatic box.cfg() call (database initialization), then – upon every configuration update.
  • The apply_config() function upon every configuration update.

As a tryout, let’s task the cluster with some actions and see the order of executing the role’s functions:

  • Join an instance or create a replica set, both with an enabled role:
    1. validate_config()
    2. init()
    3. apply_config()
  • Restart an instance with an enabled role:
    1. validate_config()
    2. init()
    3. apply_config()
  • Disable role: stop().
  • Upon the cartridge.confapplier.patch_clusterwide() call:
    1. validate_config()
    2. apply_config()
  • Upon a triggered failover:
    1. validate_config()
    2. apply_config()

Considering the described behavior:

  • The init() function may:
    • Call box functions.
    • Start a fiber and, in this case, the stop() function should take care of the fiber’s termination.
    • Configure the built-in HTTP server.
    • Execute any code related to the role’s initialization.
  • The stop() functions must undo any job that needs to be undone on role’s termination.
  • The validate_config() function must validate any configuration change.
  • The apply_config() function may execute any code related to a configuration change, e.g., take care of an expirationd fiber.

The validation and application functions together allow you to change the cluster-wide configuration as described in the next section.

Configuring custom roles

You can:

  • Store configurations for your custom roles as sections in cluster-wide configuration, for example:

    # in YAML configuration file
    my_role:
      notify_url: "https://localhost:8080"
    
    -- in init.lua file
    local notify_url = 'http://localhost'
    function my_role.apply_config(conf, opts)
      local conf = conf['my_role'] or {}
      notify_url = conf.notify_url or 'default'
    end
    
  • Download and upload cluster-wide configuration using the web interface or API (via GET/PUT queries to admin/config endpoint like curl localhost:8081/admin/config and curl -X PUT -d "{'my_parameter': 'value'}" localhost:8081/admin/config).

  • Utilize it in your role’s apply_config() function.

Every instance in the cluster stores a copy of the configuration file in its working directory (configured by cartridge.cfg({workdir = ...})):

  • /var/lib/tarantool/<instance_name>/config.yml for instances deployed from RPM packages and managed by systemd.
  • /home/<username>/tarantool_state/var/lib/tarantool/config.yml for instances deployed from tar+gz archives.

The cluster’s configuration is a Lua table, downloaded and uploaded as YAML. If some application-specific configuration data, e.g. a database schema as defined by DDL (data definition language), needs to be stored on every instance in the cluster, you can implement your own API by adding a custom section to the table. The cluster will help you spread it safely across all instances.

Such section goes in the same file with topology-specific and vshard-specific sections that the cluster generates automatically. Unlike the generated, the custom section’s modification, validation, and application logic has to be defined.

The common way is to define two functions:

  • validate_config(conf_new, conf_old) to validate changes made in the new configuration (conf_new) versus the old configuration (conf_old).
  • apply_config(conf, opts) to execute any code related to a configuration change. As input, this function takes the configuration to apply (conf, which is actually the new configuration that you validated earlier with validate_config()) and options (the opts argument that includes is_master, a Boolean flag described later).

Important

The validate_config() function must detect all configuration problems that may lead to apply_config() errors. For more information, see the next section.

When implementing validation and application functions that call box ones for some reason, mind the following precautions:

  • Due to the role’s life cycle, the cluster does not guarantee an automatic box.cfg() call prior to calling validate_config().

    If the validation function calls any box functions (e.g., to check a format), make sure the calls are wrapped in a protective conditional statement that checks if box.cfg() has already happened:

    -- Inside the validate_config() function:
    
    if type(box.cfg) == 'table' then
    
        -- Here you can call box functions
    
    end
    
  • Unlike the validation function, apply_config() can call box functions freely as the cluster applies custom configuration after the automatic box.cfg() call.

    However, creating spaces, users, etc., can cause replication collisions when performed on both master and replica instances simultaneously. The appropriate way is to call such box functions on masters only and let the changes propagate to replicas automatically.

    Upon the apply_config(conf, opts) execution, the cluster passes an is_master flag in the opts table which you can use to wrap collision-inducing box functions in a protective conditional statement:

    -- Inside the apply_config() function:
    
    if opts.is_master then
    
        -- Here you can call box functions
    
    end
    
Custom configuration example

Consider the following code as part of the role’s module (custom-role.lua) implementation:

#!/usr/bin/env tarantool
-- Custom role implementation

local cartridge = require('cartridge')

local role_name = 'custom-role'

-- Modify the config by implementing some setter (an alternative to HTTP PUT)
local function set_secret(secret)
    local custom_role_cfg = cartridge.confapplier.get_deepcopy(role_name) or {}
    custom_role_cfg.secret = secret
    cartridge.confapplier.patch_clusterwide({
        [role_name] = custom_role_cfg,
    })
end
-- Validate
local function validate_config(cfg)
    local custom_role_cfg = cfg[role_name] or {}
    if custom_role_cfg.secret ~= nil then
        assert(type(custom_role_cfg.secret) == 'string', 'custom-role.secret must be a string')
    end
    return true
end
-- Apply
local function apply_config(cfg)
    local custom_role_cfg = cfg[role_name] or {}
    local secret = custom_role_cfg.secret or 'default-secret'
    -- Make use of it
end

return {
    role_name = role_name,
    set_secret = set_secret,
    validate_config = validate_config,
    apply_config = apply_config,
}

Once the configuration is customized, do one of the following:

Applying custom role’s configuration

With the implementation showed by the example, you can call the set_secret() function to apply the new configuration via the administrative console – or an HTTP endpoint if the role exports one.

The set_secret() function calls cartridge.confapplier.patch_clusterwide() which performs a two-phase commit:

  1. It patches the active configuration in memory: copies the table and replaces the "custom-role" section in the copy with the one given by the set_secret() function.
  2. The cluster checks if the new configuration can be applied on all instances except disabled and expelled. All instances subject to update must be healthy and alive according to the membership module.
  3. (Preparation phase) The cluster propagates the patched configuration. Every instance validates it with the validate_config() function of every registered role. Depending on the validation’s result:
    • If successful (i.e., returns true), the instance saves the new configuration to a temporary file named config.prepare.yml within the working directory.
    • (Abort phase) Otherwise, the instance reports an error and all the other instances roll back the update: remove the file they may have already prepared.
  4. (Commit phase) Upon successful preparation of all instances, the cluster commits the changes. Every instance:
    1. Creates the active configuration’s hard-link.
    2. Atomically replaces the active configuration file with the prepared one. The atomic replacement is indivisible – it can either succeed or fail entirely, never partially.
    3. Calls the apply_config() function of every registered role.

If any of these steps fail, an error pops up in the web interface next to the corresponding instance. The cluster does not handle such errors automatically, they require manual repair.

You will avoid the repair if the validate_config() function can detect all configuration problems that may lead to apply_config() errors.

Using the built-in HTTP server

The cluster launches an httpd server instance during initialization (cartridge.cfg()). You can bind a port to the instance via an environmental variable:

-- Get the port from an environmental variable or the default one:
local http_port = os.getenv('HTTP_PORT') or '8080'

local ok, err = cartridge.cfg({
   ...
   -- Pass the port to the cluster:
   http_port = http_port,
   ...
})

To make use of the httpd instance, access it and configure routes inside the init() function of some role, e.g. a role that exposes API over HTTP:

local function init(opts)

...

   -- Get the httpd instance:
   local httpd = cartridge.service_get('httpd')
   if httpd ~= nil then
       -- Configure a route to, for example, metrics:
       httpd:route({
               method = 'GET',
               path = '/metrics',
               public = true,
           },
           function(req)
               return req:render({json = stat.stat()})
           end
       )
   end
end

For more information on using Tarantool’s HTTP server, see its documentation.

Implementing authorization in the web interface

To implement authorization in the web interface of every instance in a Tarantool cluster:

  1. Implement a new, say, auth module with a check_password function. It should check the credentials of any user trying to log in to the web interface.

    The check_password function accepts a username and password and returns an authentication success or failure.

    -- auth.lua
    
    -- Add a function to check the credentials
    local function check_password(username, password)
    
        -- Check the credentials any way you like
    
        -- Return an authentication success or failure
        if not ok then
            return false
        end
        return true
    end
    ...
    
  2. Pass the implemented auth module name as a parameter to cartridge.cfg(), so the cluster can use it:

    -- init.lua
    
    local ok, err = cartridge.cfg({
        auth_backend_name = 'auth',
        -- The cluster will automatically call 'require()' on the 'auth' module.
        ...
    })
    

    This adds a Log in button to the upper right corner of the web interface but still lets the unsigned users interact with the interface. This is convenient for testing.

    Note

    Also, to authorize requests to cluster API, you can use the HTTP basic authorization header.

  3. To require the authorization of every user in the web interface even before the cluster bootstrap, add the following line:

    -- init.lua
    
    local ok, err = cartridge.cfg({
        auth_backend_name = 'auth',
        auth_enabled = true,
        ...
    })
    

    With the authentication enabled and the auth module implemented, the user will not be able to even bootstrap the cluster without logging in. After the successful login and bootstrap, the authentication can be enabled and disabled cluster-wide in the web interface and the auth_enabled parameter is ignored.

Application versioning

Tarantool Cartridge understands semantic versioning as described at semver.org. When developing an application, create new Git branches and tag them appropriately. These tags are used to calculate version increments for subsequent packing.

For example, if your application has version 1.2.1, tag your current branch with 1.2.1 (annotated or not).

To retrieve the current version from Git, say:

$ git describe --long --tags
1.2.1-12-g74864f2

This output shows that we are 12 commits after the version 1.2.1. If we are to package the application at this point, it will have a full version of 1.2.1-12 and its package will be named <app_name>-1.2.1-12.rpm.

Non-semantic tags are prohibited. You will not be able to create a package from a branch with the latest tag being non-semantic.

Once you package your application, the version is saved in a VERSION file in the package root.

Using .cartridge.ignore files

You can add a .cartridge.ignore file to your application repository to exclude particular files and/or directories from package builds.

For the most part, the logic is similar to that of .gitignore files. The major difference is that in .cartridge.ignore files the order of exceptions relative to the rest of the templates does not matter, while in .gitignore files the order does matter.

.cartridge.ignore entry ignores every…
target/ folder (due to the trailing /) named target, recursively
target file or folder named target, recursively
/target file or folder named target in the top-most directory (due to the leading /)
/target/ folder named target in the top-most directory (leading and trailing /)
*.class every file or folder ending with .class, recursively
#comment nothing, this is a comment (the first character is a #)
\#comment every file or folder with name #comment (\ for escaping)
target/logs/ every folder named logs which is a subdirectory of a folder named target
target/*/logs/ every folder named logs two levels under a folder named target (* doesn’t include /)
target/**/logs/ every folder named logs somewhere under a folder named target (** includes /)
*.py[co] every file or folder ending in .pyc or .pyo; however, it doesn’t match .py!
*.py[!co] every file or folder ending in anything other than c or o
*.file[0-9] every file or folder ending in digit
*.file[!0-9] every file or folder ending in anything other than digit
* every
/* everything in the top-most directory (due to the leading /)
**/*.tar.gz every *.tar.gz file or folder which is one or more levels under the starting folder
!file every file or folder will be ignored even if it matches other patterns

Failover architecture

An important concept in cluster topology is appointing a leader. Leader is an instance which is responsible for performing key operations. To keep things simple, you can think of a leader as of the only writable master. Every replica set has its own leader, and there’s usually not more than one.

Which instance will become a leader depends on topology settings and failover configuration.

An important topology parameter is the failover priority within a replica set. This is an ordered list of instances. By default, the first instance in the list becomes a leader, but with the failover enabled it may be changed automatically if the first one is malfunctioning.

Instance configuration upon a leader change

When Cartridge configures roles, it takes into account the leadership map (consolidated in the failover.lua module). The leadership map is composed when the instance enters the ConfiguringRoles state for the first time. Later the map is updated according to the failover mode.

Every change in the leadership map is accompanied by instance re-configuration. When the map changes, Cartridge updates the read_only setting and calls the apply_config callback for every role. It also specifies the is_master flag (which actually means is_leader, but hasn’t been renamed yet due to historical reasons).

It’s important to say that we discuss a distributed system where every instance has its own opinion. Even if all opinions coincide, there still may be races between instances, and you (as an application developer) should take them into account when designing roles and their interaction.

Leader appointment rules

The logic behind leader election depends on the failover mode: disabled, eventual, or stateful.

Disabled mode

This is the simplest case. The leader is always the first instance in the failover priority. No automatic switching is performed. When it’s dead, it’s dead.

Eventual failover

In the eventual mode, the leader isn’t elected consistently. Instead, every instance in the cluster thinks that the leader is the first healthy instance in the failover priority list, while instance health is determined according to the membership status (the SWIM protocol).

Leader election is done as follows. Suppose there are two replica sets in the cluster:

  • a single router “R”,
  • two storages, “S1” and “S2”.

Then we can say: all the three instances (R, S1, S2) agree that S1 is the leader.

The SWIM protocol guarantees that eventually all instances will find a common ground, but it’s not guaranteed for every intermediate moment of time. So we may get a conflict.

For example, soon after S1 goes down, R is already informed and thinks that S2 is the leader, but S2 hasn’t received the gossip yet and still thinks he’s not. This is a conflict.

Similarly, when S1 recovers and takes the leadership, S2 may be unaware of that yet. So, both S1 and S2 consider themselves as leaders.

Moreover, SWIM protocol isn’t perfect and still can produce false-negative gossips (announce the instance is dead when it’s not).

Stateful failover

Similarly to the eventual mode, every instance composes its own leadership map, but now the map is fetched from an external state provider (that’s why this failover mode called “stateful”). Nowadays there are two state providers supported – etcd and stateboard (standalone Tarantool instance). State provider serves as a domain-specific key-value storage (simply replicaset_uuid -> leader_uuid) and a locking mechanism.

Changes in the leadership map are obtained from the state provider with the long polling technique.

All decisions are made by the coordinator – the one that holds the lock. The coordinator is implemented as a built-in Cartridge role. There may be many instances with the coordinator role enabled, but only one of them can acquire the lock at the same time. We call this coordinator the “active” one.

The lock is released automatically when the TCP connection is closed, or it may expire if the coordinator becomes unresponsive (in stateboard it’s set by the stateboard’s --lock_delay option, for etcd it’s a part of clusterwide configuration), so the coordinator renews the lock from time to time in order to be considered alive.

The coordinator makes a decision based on the SWIM data, but the decision algorithm is slightly different from that in case of eventual failover:

  • Right after acquiring the lock from the state provider, the coordinator fetches the leadership map.
  • If there is no leader appointed for the replica set, the coordinator appoints the first leader according to the failover priority, regardless of the SWIM status.
  • If a leader becomes degraded, the coordinator makes a decision. A new leader is the first healthy instance from the failover priority list. If an old leader recovers, no leader change is made until the current leader down. Changing failover priority doesn’t affect this.
  • Every appointment (self-made or fetched) is immune for a while (controlled by the IMMUNITY_TIMEOUT option).
The case: external provider outage

In this case instances do nothing: the leader remains a leader, read-only instances remain read-only. If any instance restarts during an external state provider outage, it composes an empty leadership map: it doesn’t know who actually is a leader and thinks there is none.

The case: coordinator outage

An active coordinator may be absent in a cluster either because of a failure or due to disabling the role everywhere. Just like in the previous case, instances do nothing about it: they keep fetching the leadership map from the state provider. But it will remain the same until a coordinator appears.

Manual leader promotion

It differs a lot depending on the failover mode.

In the disabled and eventual modes, you can only promote a leader by changing the failover priority (and applying a new clusterwide configuration).

In the stateful mode, the failover priority doesn’t make much sense (except for the first appointment). Instead, you should use the promotion API (the Lua cartridge.failover_promote or the GraphQL mutation {cluster{failover_promote()}}) which pushes manual appointments to the state provider.

The stateful failover mode implies consistent promotion: before becoming writable, each instance performs the wait_lsn operation to sync up with the previous one.

Information about the previous leader (we call it a vclockkeeper) is also stored on the external storage. Even when the old leader is demoted, it remains the vclockkeeper until the new leader successfully awaits and persists its vclock on the external storage.

If replication is stuck and consistent promotion isn’t possible, a user has two options: to revert promotion (to re-promote the old leader) or to force it inconsistently (all kinds of failover_promote API has force_inconsistency flag).

Failover configuration

These are clusterwide parameters:

  • mode: “disabled” / “eventual” / “stateful”.
  • state_provider: “tarantool” / “etcd”.
  • tarantool_params: {uri = "...", password = "..."}.
  • etcd2_params: {endpoints = {...}, prefix = "/", lock_delay = 10, username = "", password = ""}.
GraphQL API

Use your favorite GraphQL client (e.g. Altair) for requests introspection:

  • query {cluster{failover_params{}}},
  • mutation {cluster{failover_params(){}}},
  • mutation {cluster{failover_promote()}}.
Stateboard configuration

Like other Cartridge instances, the stateboard supports cartridge.argprase options:

  • listen
  • workdir
  • password
  • lock_delay

Similarly to other argparse options, they can be passed via command-line arguments or via environment variables, e.g.:

.rocks/bin/stateboard --workdir ./dev/stateboard --listen 4401 --password qwerty
Fine-tuning failover behavior

Besides failover priority and mode, there are some other private options that influence failover operation:

  • LONGPOLL_TIMEOUT (failover) – the long polling timeout (in seconds) to fetch new appointments (default: 30);
  • NETBOX_CALL_TIMEOUT (failover/coordinator) – stateboard client’s connection timeout (in seconds) applied to all communications (default: 1);
  • RECONNECT_PERIOD (coordinator) – time (in seconds) to reconnect to the state provider if it’s unreachable (default: 5);
  • IMMUNITY_TIMEOUT (coordinator) – minimal amount of time (in seconds) to wait before overriding an appointment (default: 15).

Configuring instances

Cartridge orchestrates a distributed system of Tarantool instances – a cluster. One of the core concepts is clusterwide configuration. Every instance in a cluster stores a copy of it.

Clusterwide configuration contains options that must be identical on every cluster node, such as the topology of the cluster, failover and vshard configuration, authentication parameters and ACLs, and user-defined configuration.

Clusterwide configuration doesn’t provide instance-specific parameters: ports, workdirs, memory settings, etc.

Configuration basics

Instance configuration includes two sets of parameters:

You can set any of these parameters in:

  1. Command line arguments.
  2. Environment variables.
  3. YAML configuration file.
  4. init.lua file.

The order here indicates the priority: command-line arguments override environment variables, and so forth.

No matter how you start the instances, you need to set the following cartridge.cfg() parameters for each instance:

  • advertise_uri – either <HOST>:<PORT>, or <HOST>:, or <PORT>. Used by other instances to connect to the current one. DO NOT specify 0.0.0.0 – this must be an external IP address, not a socket bind.
  • http_port – port to open administrative web interface and API on. Defaults to 8081. To disable it, specify "http_enabled": False.
  • workdir – a directory where all data will be stored: snapshots, wal logs, and cartridge configuration file. Defaults to ..

If you start instances using cartridge CLI or systemctl, save the configuration as a YAML file, for example:

my_app.router: {"advertise_uri": "localhost:3301", "http_port": 8080}
my_app.storage_A: {"advertise_uri": "localhost:3302", "http_enabled": False}
my_app.storage_B: {"advertise_uri": "localhost:3303", "http_enabled": False}

With cartridge CLI, you can pass the path to this file as the --cfg command-line argument to the cartridge start command – or specify the path in cartridge CLI configuration (in ./.cartridge.yml or ~/.cartridge.yml):

cfg: cartridge.yml
run_dir: tmp/run
apps_path: /usr/local/share/tarantool

With systemctl, save the YAML file to /etc/tarantool/conf.d/ (the default systemd path) or to a location set in the TARANTOOL_CFG environment variable.

If you start instances with tarantool init.lua, you need to pass other configuration options as command-line parameters and environment variables, for example:

$ tarantool init.lua --alias router --memtx-memory 100 --workdir "~/db/3301" --advertise_uri "localhost:3301" --http_port "8080"
Internal representation of clusterwide configuration

In the file system, clusterwide configuration is represented by a file tree. Inside workdir of any configured instance you can find the following directory:

config/
├── auth.yml
├── topology.yml
└── vshard_groups.yml

This is the clusterwide configuration with three default config sectionsauth, topology, and vshard_groups.

Due to historical reasons clusterwide configuration has two appearances:

  • old-style single-file config.yml with all sections combined, and
  • modern multi-file representation mentioned above.

Before cartridge v2.0 it used to look as follows, and this representation is still used in HTTP API and luatest helpers.

# config.yml
---
auth: {...}
topology: {...}
vshard_groups: {...}
...

Beyond these essential sections, clusterwide configuration may be used for storing some other role-specific data. Clusterwide configuration supports YAML as well as plain text sections. It can also be organized in nested subdirectories.

In Lua it’s represented by the ClusterwideConfig object (a table with metamethods). Refer to the cartridge.clusterwide-config module documentation for more details.

Two-phase commit

Cartridge manages clusterwide configuration to be identical everywhere using the two-phase commit algorithm implemented in the cartridge.twophase module. Changes in clusterwide configuration imply applying it on every instance in the cluster.

Almost every change in cluster parameters triggers a two-phase commit: joining/expelling a server, editing replica set roles, managing users, setting failover and vshard configuration.

Two-phase commit requires all instances to be alive and healthy, otherwise it returns an error.

For more details, please, refer to the cartridge.config_patch_clusterwide API reference.

Managing role-specific data

Beside system sections, clusterwide configuration may be used for storing some other role-specific data. It supports YAML as well as plain text sections. And it can also be organized in nested subdirectories.

Role-specific sections are used by some third-party roles, i.e. sharded-queue and cartridge-extensions.

A user can influence clusterwide configuration in various ways. You can alter configuration using Lua, HTTP or GraphQL API. Also there are luatest helpers available.

HTTP API

It works with old-style single-file representation only. It’s useful when there are only few sections needed.

Example:

cat > config.yml << CONFIG
---
custom_section: {}
...
CONFIG

Upload new config:

curl -v "localhost:8081/admin/config" -X PUT --data-binary @config.yml

Download it:

curl -v "localhost:8081/admin/config" -o config.yml

It’s suitable for role-specific sections only. System sections (topology, auth, vshard_groups, users_acl) can be neither uploaded nor downloaded.

If authorization is enabled, use the curl option --user username:password.

GraphQL API

GraphQL API, by contrast, is only suitable for managing plain-text sections in the modern multi-file appearance. It is mostly used by WebUI, but sometimes it’s also helpful in tests:

g.cluster.main_server:graphql({query = [[
    mutation($sections: [ConfigSectionInput!]) {
        cluster {
            config(sections: $sections) {
                filename
                content
            }
        }
    }]],
    variables = {sections = {
      {
        filename = 'custom_section.yml',
        content = '---\n{}\n...',
      }
    }}
})

Unlike HTTP API, GraphQL affects only the sections mentioned in the query. All the other sections remain unchanged.

Similarly to HTTP API, GraphQL cluster {config} query isn’t suitable for managing system sections.

Lua API

It’s not the most convenient way to configure third-party role, but it may be useful for role development. Please, refer to the corresponding API reference:

  • cartridge.config_patch_clusterwide
  • cartridge.config_get_deepcopy
  • cartridge.config_get_readonly

Example (from sharded-queue, simplified):

function create_tube(tube_name, tube_opts)
    local tubes = cartridge.config_get_deepcopy('tubes') or {}
    tubes[tube_name] = tube_opts or {}

    return cartridge.config_patch_clusterwide({tubes = tubes})
end

local function validate_config(conf)
    local tubes = conf.tubes or {}
    for tube_name, tube_opts in pairs(tubes) do
        -- validate tube_opts
    end
    return true
end

local function apply_config(conf, opts)
    if opts.is_master then
        local tubes = cfg.tubes or {}
        -- create tubes according to the configuration
    end
    return true
end
Luatest helpers

Cartridge test helpers provide methods for configuration management:

  • cartridge.test-helpers.cluster:upload_config,
  • cartridge.test-helpers.cluster:download_config.

Internally they wrap the HTTP API.

Example:

g.before_all(function()
    g.cluster = helpers.Cluster.new(...)
    g.cluster:upload_config({some_section = 'some_value'})
    t.assert_equals(
        g.cluster:download_config(),
        {some_section = 'some_value'}
    )
end)

Deploying an application

After you’ve developed your application locally, you can deploy it to a test or production environment.

“Deploy” includes packing the application into a specific distribution format, installing to the target system, and running the application.

You have four options to deploy a Tarantool Cartridge application:

  • as an rpm package (for production);
  • as a deb package (for production);
  • as a tar+gz archive (for testing, or as a workaround for production if root access is unavailable).
  • from sources (for local testing only).
Deploying as an rpm or deb package

The choice between DEB and RPM depends on the package manager of the target OS. For example, DEB is native for Debian Linux, and RPM – for CentOS.

  1. Pack the application into a distributable:

    $ cartridge pack rpm APP_NAME
    # -- OR --
    $ cartridge pack deb APP_NAME
    

    This will create an RPM package (e.g. ./my_app-0.1.0-1.rpm) or a DEB package (e.g. ./my_app-0.1.0-1.deb).

  2. Upload the package to target servers, with systemctl supported.

  3. Install:

    $ yum install APP_NAME-VERSION.rpm
    # -- OR --
    $ dpkg -i APP_NAME-VERSION.deb
    
  4. Configure the instance(s).

  5. Start Tarantool instances with the corresponding services. You can do it using systemctl, for example:

    # starts a single instance
    $ systemctl start my_app
    
    # starts multiple instances
    $ systemctl start my_app@router
    $ systemctl start my_app@storage_A
    $ systemctl start my_app@storage_B
    
  6. In case it is a cluster-aware application, proceed to deploying the cluster.

    Note

    If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:

    1. In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
    2. In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.
Deploying as a tar+gz archive
  1. Pack the application into a distributable:

    $ cartridge pack tgz APP_NAME
    

    This will create a tar+gz archive (e.g. ./my_app-0.1.0-1.tgz).

  2. Upload the archive to target servers, with tarantool and (optionally) cartridge-cli installed.

  3. Extract the archive:

    $ tar -xzvf APP_NAME-VERSION.tgz
    
  4. Configure the instance(s).

  5. Start Tarantool instance(s). You can do it using:

    • tarantool, for example:

      $ tarantool init.lua # starts a single instance
      
    • or cartridge, for example:

      # in application directory
      $ cartridge start # starts all instances
      $ cartridge start .router_1 # starts a single instance
      
      # in multi-application environment
      $ cartridge start my_app # starts all instances of my_app
      $ cartridge start my_app.router # starts a single instance
      
  6. In case it is a cluster-aware application, proceed to deploying the cluster.

    Note

    If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:

    1. In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
    2. In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.
Deploying from sources

This deployment method is intended for local testing only.

  1. Pull all dependencies to the .rocks directory:

    $ tarantoolctl rocks make

  2. Configure the instance(s).

  3. Start Tarantool instance(s). You can do it using:

    • tarantool, for example:

      $ tarantool init.lua # starts a single instance
      
    • or cartridge, for example:

      # in application directory
      cartridge start # starts all instances
      cartridge start .router_1 # starts a single instance
      
      # in multi-application environment
      cartridge start my_app # starts all instances of my_app
      cartridge start my_app.router # starts a single instance
      
  4. In case it is a cluster-aware application, proceed to deploying the cluster.

    Note

    If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:

    1. In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
    2. In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.

Starting/stopping instances

Depending on your deployment method, you can start/stop the instances using tarantool, cartridge CLI, or systemctl.

Start/stop using tarantool

With tarantool, you can start only a single instance:

$ tarantool init.lua # the simplest command

You can also specify more options on the command line or in environment variables.

To stop the instance, use Ctrl+C.

Start/stop using cartridge CLI

With cartridge CLI, you can start one or multiple instances:

$ cartridge start [APP_NAME[.INSTANCE_NAME]] [options]

The options are:

--script FILE

Application’s entry point. Defaults to:

  • TARANTOOL_SCRIPT, or
  • ./init.lua when running from the app’s directory, or
  • :apps_path/:app_name/init.lua in a multi-app environment.
--apps_path PATH
Path to apps directory when running in a multi-app environment. Defaults to /usr/share/tarantool.
--run_dir DIR
Directory with pid and sock files. Defaults to TARANTOOL_RUN_DIR or /var/run/tarantool.
--cfg FILE
Cartridge instances YAML configuration file. Defaults to TARANTOOL_CFG or ./instances.yml.
--foreground
Do not daemonize.

For example:

cartridge start my_app --cfg demo.yml --run_dir ./tmp/run --foreground

It starts all tarantool instances specified in cfg file, in foreground, with enforced environment variables.

When APP_NAME is not provided, cartridge parses it from ./*.rockspec filename.

When INSTANCE_NAME is not provided, cartridge reads cfg file and starts all defined instances:

# in application directory
cartridge start # starts all instances
cartridge start .router_1 # start single instance

# in multi-application environment
cartridge start my_app # starts all instances of my_app
cartridge start my_app.router # start a single instance

To stop the instances, say:

$ cartridge stop [APP_NAME[.INSTANCE_NAME]] [options]

These options from the cartridge start command are supported:

  • --run_dir DIR
  • --cfg FILE
Start/stop using systemctl
  • To run a single instance:

    $ systemctl start APP_NAME
    

    This will start a systemd service that will listen to the port specified in instance configuration (http_port parameter).

  • To run multiple instances on one or multiple servers:

    $ systemctl start APP_NAME@INSTANCE_1
    $ systemctl start APP_NAME@INSTANCE_2
    ...
    $ systemctl start APP_NAME@INSTANCE_N
    

    where APP_NAME@INSTANCE_N is the instantiated service name for systemd with an incremental N – a number, unique for every instance, added to the port the instance will listen to (e.g., 3301, 3302, etc.)

  • To stop all services on a server, use the systemctl stop command and specify instance names one by one. For example:

    $ systemctl stop APP_NAME@INSTANCE_1 APP_NAME@INSTANCE_2 ... APP_NAME@INSTANCE_<N>
    

When running instances with systemctl, keep these practices in mind:

  • You can specify instance configuration in a YAML file.

    This file can contain these options; see an example here).

    Save this file to /etc/tarantool/conf.d/ (the default systemd path) or to a location set in the TARANTOOL_CFG environment variable (if you’ve edited the application’s systemd unit file). The file name doesn’t matter: it can be instances.yml or anything else you like.

    Here’s what systemd is doing further:

    • obtains app_name (and instance_name, if specified) from the name of the application’s systemd unit file (e.g. APP_NAME@default or APP_NAME@INSTANCE_1);
    • sets default console socket (e.g. /var/run/tarantool/APP_NAME@INSTANCE_1.control), PID file (e.g. /var/run/tarantool/APP_NAME@INSTANCE_1.pid) and workdir (e.g. /var/lib/tarantool/<APP_NAME>.<INSTANCE_NAME>). Environment=TARANTOOL_WORKDIR=${workdir}.%i

    Finally, cartridge looks across all YAML files in /etc/tarantool/conf.d for a section with the appropriate name (e.g. app_name that contains common configuration for all instances, and app_name.instance_1 that contain instance-specific configuration). As a result, Cartridge options workdir, console_sock, and pid_file in the YAML file cartridge.cfg become useless, because systemd overrides them.

  • The default tool for querying logs is journalctl. For example:

    # show log messages for a systemd unit named APP_NAME.INSTANCE_1
    $ journalctl -u APP_NAME.INSTANCE_1
    
    # show only the most recent messages and continuously print new ones
    $ journalctl -f -u APP_NAME.INSTANCE_1
    

    If really needed, you can change logging-related box.cfg options in the YAML configuration file: see log and other related options.

Error handling guidelines

Almost all errors in Cartridge follow the return nil, err style, where err is an error object produced by Tarantool’s errors module. Cartridge doesn’t raise errors except for bugs and functions contracts mismatch. Developing new roles should follow these guidelines as well.

Error objects in Lua

Error classes help to locate the problem’s source. For this purpose, an error object contains its class, stack traceback, and a message.

local errors = require('errors')
local DangerousError = errors.new_class("DangerousError")

local function some_fancy_function()

    local something_bad_happens = true

    if something_bad_happens then
        return nil, DangerousError:new("Oh boy")
    end

    return "success" -- not reachable due to the error
end

print(some_fancy_function())
nil DangerousError: Oh boy
stack traceback:
    test.lua:9: in function 'some_fancy_function'
    test.lua:15: in main chunk

For uniform error handling, errors provides the :pcall API:

local ret, err = DangerousError:pcall(some_fancy_function)
print(ret, err)
nil DangerousError: Oh boy
stack traceback:
    test.lua:9: in function <test.lua:4>
    [C]: in function 'xpcall'
    .rocks/share/tarantool/errors.lua:139: in function 'pcall'
    test.lua:15: in main chunk

`lua print(DangerousError:pcall(error, 'what could possibly go wrong?')) `

nil DangerousError: what could possibly go wrong?
stack traceback:
    [C]: in function 'xpcall'
    .rocks/share/tarantool/errors.lua:139: in function 'pcall'
    test.lua:15: in main chunk

For errors.pcall there is no difference between the return nil, err and error() approaches.

Note that errors.pcall API differs from the vanilla Lua pcall. Instead of true the former returns values returned from the call. If there is an error, it returns nil instead of false, plus an error message.

Remote net.box calls keep no stack trace from the remote. In that case, errors.netbox_eval comes to the rescue. It will find a stack trace from local and remote hosts and restore metatables.

> conn = require('net.box').connect('localhost:3301')
> print( errors.netbox_eval(conn, 'return nil, DoSomethingError:new("oops")') )
nil     DoSomethingError: oops
stack traceback:
        eval:1: in main chunk
during net.box eval on localhost:3301
stack traceback:
        [string "return print( errors.netbox_eval("]:1: in main chunk
        [C]: in function 'pcall'

However, vshard implemented in Tarantool doesn’t utilize the errors module. Instead it uses its own errors. Keep this in mind when working with vshard functions.

Data included in an error object (class name, message, traceback) may be easily converted to string using the tostring() function.

GraphQL

GraphQL implementation in Cartridge wraps the errors module, so a typical error response looks as follows:

{
    "errors":[{
        "message":"what could possibly go wrong?",
        "extensions":{
            "io.tarantool.errors.stack":"stack traceback: ...",
            "io.tarantool.errors.class_name":"DangerousError"
        }
    }]
}

Read more about errors in the GraphQL specification.

If you’re going to implement a GraphQL handler, you can add your own extension like this:

local err = DangerousError:new('I have extension')
err.graphql_extensions = {code = 403}

It will lead to the following response:

{
    "errors":[{
        "message":"I have extension",
        "extensions":{
            "io.tarantool.errors.stack":"stack traceback: ...",
            "io.tarantool.errors.class_name":"DangerousError",
            "code":403
        }
    }]
}
HTTP

In a nutshell, an errors object is a table. This means that it can be swiftly represented in JSON. This approach is used by Cartridge to handle errors via http:

local err = DangerousError:new('Who would have thought?')

local resp = req:render({
    status = 500,
    headers = {
        ['content-type'] = "application/json; charset=utf-8"
    },
    json = json.encode(err),
})
{
    "line":27,
    "class_name":"DangerousError",
    "err":"Who would have thought?",
    "file":".../app/roles/api.lua",
    "stack":"stack traceback:..."
}

Administrator’s guide

This guide explains how to deploy and manage a Tarantool cluster with Tarantool Cartridge.

Note

For more information on managing Tarantool instances, see the server administration section of the Tarantool manual.

Before deploying the cluster, familiarize yourself with the notion of cluster roles and deploy Tarantool instances according to the desired cluster topology.

Deploying the cluster

To deploy the cluster, first, configure your Tarantool instances according to the desired cluster topology, for example:

my_app.router: {"advertise_uri": "localhost:3301", "http_port": 3301, "workdir": "./tmp/router"}
my_app.storage_A_master: {"advertise_uri": "localhost:3302", "http_enabled": False, "workdir": "./tmp/storage-a-master"}
my_app.storage_A_replica: {"advertise_uri": "localhost:3303", "http_enabled": False, "workdir": "./tmp/storage-a-replica"}
my_app.storage_B_master: {"advertise_uri": "localhost:3304", "http_enabled": False, "workdir": "./tmp/storage-b-master"}
my_app.storage_B_replica: {"advertise_uri": "localhost:3305", "http_enabled": False, "workdir": "./tmp/storage-b-replica"}

Then start the instances, for example using cartridge CLI:

cartridge start my_app --cfg demo.yml --run_dir ./tmp/run --foreground

And bootstrap the cluster. You can do this via the Web interface which is available at http://<instance_hostname>:<instance_http_port> (in this example, http://localhost:3301).

In the web interface, do the following:

  1. Depending on the authentication state:

    • If enabled (in production), enter your credentials and click Login:

      _images/auth_creds-border-5px.png

       

    • If disabled (for easier testing), simply proceed to configuring the cluster.

  2. Click Сonfigure next to the first unconfigured server to create the first replica set – solely for the router (intended for compute-intensive workloads).

    _images/unconfigured-router-border-5px.png

     

    In the pop-up window, check the vshard-router role – or any custom role that has vshard-router as a dependent role (in this example, this is a custom role named app.roles.api).

    (Optional) Specify a display name for the replica set, for example router.

    _images/create-router-border-5px.png

     

    Note

    As described in the built-in roles section, it is a good practice to enable workload-specific cluster roles on instances running on physical servers with workload-specific hardware.

    Click Create replica set and see the newly-created replica set in the web interface:

    _images/router-replica-set-border-5px.png

     

    Warning

    Be careful: after an instance joins a replica set, you CAN NOT revert this or make the instance join any other replica set.

  3. Create another replica set – for a master storage node (intended for transaction-intensive workloads).

    Check the vshard-storage role – or any custom role that has vshard-storage as a dependent role (in this example, this is a custom role named app.roles.storage).

    (Optional) Check a specific group, for example hot. Replica sets with vshard-storage roles can belong to different groups. In our example, these are hot or cold groups meant to process hot and cold data independently. These groups are specified in the cluster’s configuration file; by default, a cluster has no groups.

    (Optional) Specify a display name for the replica set, for example hot-storage.

    Click Create replica set.

    _images/create-storage-border-5px.png

     

  4. (Optional) If required by topology, populate the second replica set with more storage nodes:

    1. Click Configure next to another unconfigured server dedicated for transaction-intensive workloads.

    2. Click Join Replica Set tab.

    3. Select the second replica set, and click Join replica set to add the server to it.

      _images/join-storage-border-5px.png

       

  5. Depending on cluster topology:

    • add more instances to the first or second replica sets, or
    • create more replica sets and populate them with instances meant to handle a specific type of workload (compute or transactions).

    For example:

    _images/final-cluster-border-5px.png

     

  6. (Optional) By default, all new vshard-storage replica sets get a weight of 1 before the vshard bootstrap in the next step.

    Note

    In case you add a new replica set after vshard bootstrap, as described in the topology change section, it will get a weight of 0 by default.

    To make different replica sets store different numbers of buckets, click Edit next to a replica set, change its default weight, and click Save:

    _images/change-weight-border-5px.png

     

    For more information on buckets and replica set’s weights, see the vshard module documentation.

  7. Bootstrap vshard by clicking the corresponding button, or by saying cartridge.admin.boostrap_vshard() over the administrative console.

    This command creates virtual buckets and distributes them among storages.

    From now on, all cluster configuration can be done via the web interface.

Updating the configuration

Cluster configuration is specified in a YAML configuration file. This file includes cluster topology and role descriptions.

All instances in Tarantool cluster have the same configuration. To this end, every instance stores a copy of the configuration file, and the cluster keeps these copies in sync: as you submit updated configuration in the Web interface, the cluster validates it (and rejects inappropriate changes) and distributes automatically across the cluster.

To update the configuration:

  1. Click Configuration files tab.

  2. (Optional) Click Downloaded to get hold of the current configuration file.

  3. Update the configuration file.

    You can add/change/remove any sections except system ones: topology, vshard, and vshard_groups.

    To remove a section, simply remove it from the configuration file.

  4. Compress the configuration file as a .zip archive and click Upload configuration button to upload it.

    You will see a message in the lower part of the screen saying whether configuration was uploaded successfully, and an error description if the new configuration was not applied.

Managing the cluster

This chapter explains how to:

  • change the cluster topology,
  • enable automatic failover,
  • switch the replica set’s master manually,
  • deactivate replica sets, and
  • expel instances.
Changing the cluster topology

Upon adding a newly deployed instance to a new or existing replica set:

  1. The cluster validates the configuration update by checking if the new instance is available using the membership module.

    Note

    The membership module works over the UDP protocol and can operate before the box.cfg function is called.

    All the nodes in the cluster must be healthy for validation success.

  2. The new instance waits until another instance in the cluster receives the configuration update and discovers it, again, using the membership module. On this step, the new instance does not have a UUID yet.

  3. Once the instance realizes its presence is known to the cluster, it calls the box.cfg function and starts living its life.

An optimal strategy for connecting new nodes to the cluster is to deploy a new zero-weight replica set instance by instance, and then increase the weight. Once the weight is updated and all cluster nodes are notified of the configuration change, buckets start migrating to new nodes.

To populate the cluster with more nodes, do the following:

  1. Deploy new Tarantool instances as described in the deployment section.

    If new nodes do not appear in the Web interface, click Probe server and specify their URIs manually.

    _images/probe-server-border-5px.png

     

    If a node is accessible, it will appear in the list.

  2. In the Web interface:

    • Create a new replica set with one of the new instances: click Configure next to an unconfigured server, check the necessary roles, and click Create replica set:

      Note

      In case you are adding a new vshard-storage instance, remember that all such instances get a 0 weight by default after the vshard bootstrap which happened during the initial cluster deployment.

      _images/zero-border-5px.png

       

    • Or add the instances to existing replica sets: click Configure next to an unconfigured server, click Join replica set tab, select a replica set, and click Join replica set.

    If necessary, repeat this for more instances to reach the desired redundancy level.

  3. In case you are deploying a new vshard-storage replica set, populate it with data when you are ready: click Edit next to the replica set in question, increase its weight, and click Save to start data rebalancing.

As an alternative to the web interface, you can view and change cluster topology via GraphQL. The cluster’s endpoint for serving GraphQL queries is /admin/api. You can use any third-party GraphQL client like GraphiQL or Altair.

Examples:

  • listing all servers in the cluster:

    query {
        servers { alias uri uuid }
    }
    
  • listing all replica sets with their servers:

    query {
        replicasets {
            uuid
            roles
            servers { uri uuid }
        }
    }
    
  • joining a server to a new replica set with a storage role enabled:

    mutation {
        join_server(
            uri: "localhost:33003"
            roles: ["vshard-storage"]
        )
    }
    
Data rebalancing

Rebalancing (resharding) is initiated periodically and upon adding a new replica set with a non-zero weight to the cluster. For more information, see the rebalancing process section of the vshard module documentation.

The most convenient way to trace through the process of rebalancing is to monitor the number of active buckets on storage nodes. Initially, a newly added replica set has 0 active buckets. After a few minutes, the background rebalancing process begins to transfer buckets from other replica sets to the new one. Rebalancing continues until the data is distributed evenly among all replica sets.

To monitor the current number of buckets, connect to any Tarantool instance over the administrative console, and say:

tarantool> vshard.storage.info().bucket
---
- receiving: 0
  active: 1000
  total: 1000
  garbage: 0
  sending: 0
...

The number of buckets may be increasing or decreasing depending on whether the rebalancer is migrating buckets to or from the storage node.

For more information on the monitoring parameters, see the monitoring storages section.

Deactivating replica sets

To deactivate an entire replica set (e.g., to perform maintenance on it) means to move all of its buckets to other sets.

To deactivate a set, do the following:

  1. Click Edit next to the set in question.

  2. Set its weight to 0 and click Save:

    _images/zero-weight-border-5px.png

     

  3. Wait for the rebalancing process to finish migrating all the set’s buckets away. You can monitor the current bucket number as described in the data rebalancing section.

Expelling instances

Once an instance is expelled, it can never participate in the cluster again as every instance will reject it.

To expel an instance, click next to it, then click Expel server and Expel:

_images/expelling-instance-border-5px.png

 

Note

There are two restrictions:

  • You can’t expel a leader if it has a replica. Switch leadership first.
  • You can’t expel a vshard-storage if it has buckets. Set the weight to zero and wait until rebalancing is completed.
Enabling automatic failover

In a master-replica cluster configuration with automatic failover enabled, if the user-specified master of any replica set fails, the cluster automatically chooses the next replica from the priority list and grants it the active master role (read/write). When the failed master comes back online, its role is restored and the active master, again, becomes a replica (read-only). This works for any roles.

To set the priority in a replica set:

  1. Click Edit next to the replica set in question.

  2. Scroll to the bottom of the Edit replica set box to see the list of servers.

  3. Drag replicas to their place in the priority list, and click Save:

    _images/failover-priority-border-5px.png

     

The failover is disabled by default. To enable it:

  1. Click Failover:

    _images/failover-border-5px.png

     

  2. In the Failover control box, click Enable:

    _images/failover-control-border-5px.png

     

The failover status will change to enabled:

_images/enabled-failover-border-5px.png

 

For more information, see the replication section of the Tarantool manual.

Switching the replica set’s master

To manually switch the master in a replica set:

  1. Click the Edit button next to the replica set in question:

    _images/edit-replica-set-border-5px.png

     

  2. Scroll to the bottom of the Edit replica set box to see the list of servers. The server on the top is the master.

    _images/switch-master-border-5px.png

     

  3. Drag a required server to the top position and click Save.

The new master will automatically enter the read/write mode, while the ex-master will become read-only. This works for any roles.

Managing users

On the Users tab, you can enable/disable authentication as well as add, remove, edit, and view existing users who can access the web interface.

_images/users-tab-border-5px.png

 

Notice that the Users tab is available only if authorization in the web interface is implemented.

Also, some features (like deleting users) can be disabled in the cluster configuration; this is regulated by the auth_backend_name option passed to cartridge.cfg().

Resolving conflicts

Tarantool has an embedded mechanism for asynchronous replication. As a consequence, records are distributed among the replicas with a delay, so conflicts can arise.

To prevent conflicts, the special trigger space.before_replace is used. It is executed every time before making changes to the table for which it was configured. The trigger function is implemented in the Lua programming language. This function takes the original and new values of the tuple to be modified as its arguments. The returned value of the function is used to change the result of the operation: this will be the new value of the modified tuple.

For insert operations, the old value is absent, so nil is passed as the first argument.

For delete operations, the new value is absent, so nil is passed as the second argument. The trigger function can also return nil, thus turning this operation into delete.

This example shows how to use the space.before_replace trigger to prevent replication conflicts. Suppose we have a box.space.test table that is modified in multiple replicas at the same time. We store one payload field in this table. To ensure consistency, we also store the last modification time in each tuple of this table and set the space.before_replace trigger, which gives preference to newer tuples. Below is the code in Lua:

fiber = require('fiber')
-- define a function that will modify the function test_replace(tuple)
        -- add a timestamp to each tuple in the space
        tuple = box.tuple.new(tuple):update{{'!', 2, fiber.time()}}
        box.space.test:replace(tuple)
end
box.cfg{ } -- restore from the local directory
-- set the trigger to avoid conflicts
box.space.test:before_replace(function(old, new)
        if old ~= nil and new ~= nil and new[2] < old[2] then
                return old -- ignore the request
        end
        -- otherwise apply as is
end)
box.cfg{ replication = {...} } -- subscribe

Monitoring a cluster via CLI

This section describes parameters you can monitor over the administrative console.

Connecting to nodes via CLI

Each Tarantool node (router/storage) provides an administrative console (Command Line Interface) for debugging, monitoring, and troubleshooting. The console acts as a Lua interpreter and displays the result in the human-readable YAML format. To connect to a Tarantool instance via the console, say:

$ tarantoolctl connect <instance_hostname>:<port>

where the <instance_hostname>:<port> is the instance’s URI.

Monitoring storages

Use vshard.storage.info() to obtain information on storage nodes.

Output example
tarantool> vshard.storage.info()
---
- replicasets:
    <replicaset_2>:
    uuid: <replicaset_2>
    master:
        uri: storage:storage@127.0.0.1:3303
    <replicaset_1>:
    uuid: <replicaset_1>
    master:
        uri: storage:storage@127.0.0.1:3301
  bucket: <!-- buckets status
    receiving: 0 <!-- buckets in the RECEIVING state
    active: 2 <!-- buckets in the ACTIVE state
    garbage: 0 <!-- buckets in the GARBAGE state (are to be deleted)
    total: 2 <!-- total number of buckets
    sending: 0 <!-- buckets in the SENDING state
  status: 1 <!-- the status of the replica set
  replication:
    status: disconnected <!-- the status of the replication
    idle: <idle>
  alerts:
  - ['MASTER_IS_UNREACHABLE', 'Master is unreachable: disconnected']
Status list
Code Critical level Description
0 Green A replica set works in a regular way.
1 Yellow There are some issues, but they don’t affect a replica set efficiency (worth noticing, but don’t require immediate intervention).
2 Orange A replica set is in a degraded state.
3 Red A replica set is disabled.
Potential issues
  • MISSING_MASTER — No master node in the replica set configuration.

    Critical level: Orange.

    Cluster condition: Service is degraded for data-change requests to the replica set.

    Solution: Set the master node for the replica set in the configuration using API.

  • UNREACHABLE_MASTER — No connection between the master and the replica.

    Critical level:

    • If idle value doesn’t exceed T1 threshold (1 s.) — Yellow,
    • If idle value doesn’t exceed T2 threshold (5 s.) — Orange,
    • If idle value exceeds T3 threshold (10 s.) — Red.

    Cluster condition: For read requests to replica, the data may be obsolete compared with the data on master.

    Solution: Reconnect to the master: fix the network issues, reset the current master, switch to another master.

  • LOW_REDUNDANCY — Master has access to a single replica only.

    Critical level: Yellow.

    Cluster condition: The data storage redundancy factor is equal to 2. It is lower than the minimal recommended value for production usage.

    Solution: Check cluster configuration:

    • If only one master and one replica are specified in the configuration, it is recommended to add at least one more replica to reach the redundancy factor of 3.
    • If three or more replicas are specified in the configuration, consider checking the replicas’ states and network connection among the replicas.
  • INVALID_REBALANCING — Rebalancing invariant was violated. During migration, a storage node can either send or receive buckets. So it shouldn’t be the case that a replica set sends buckets to one replica set and receives buckets from another replica set at the same time.

    Critical level: Yellow.

    Cluster condition: Rebalancing is on hold.

    Solution: There are two possible reasons for invariant violation:

    • The rebalancer has crashed.
    • Bucket states were changed manually.

    Either way, please contact Tarantool support.

  • HIGH_REPLICATION_LAG — Replica’s lag exceeds T1 threshold (1 sec.).

    Critical level:

    • If the lag doesn’t exceed T1 threshold (1 sec.) — Yellow;
    • If the lag exceeds T2 threshold (5 sec.) — Orange.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the Tarantool troubleshooting guide.

  • OUT_OF_SYNC — Mal-synchronization occured. The lag exceeds T3 threshold (10 sec.).

    Critical level: Red.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the Tarantool troubleshooting guide.

  • UNREACHABLE_REPLICA — One or multiple replicas are unreachable.

    Critical level: Yellow.

    Cluster condition: Data storage redundancy factor for the given replica set is less than the configured factor. If the replica is next in the queue for rebalancing (in accordance with the weight configuration), the requests are forwarded to the replica that is still next in the queue.

    Solution: Check the error message and find out which replica is unreachable. If a replica is disabled, enable it. If this doesn’t help, consider checking the network.

  • UNREACHABLE_REPLICASET — All replicas except for the current one are unreachable. Critical level: Red.

    Cluster condition: The replica stores obsolete data.

    Solution: Check if the other replicas are enabled. If all replicas are enabled, consider checking network issues on the master. If the replicas are disabled, check them first: the master might be working properly.

Monitoring routers

Use vshard.router.info() to obtain information on the router.

Output example
tarantool> vshard.router.info()
---
- replicasets:
    <replica set UUID>:
      master:
        status: <available / unreachable / missing>
        uri: <!-- URI of master
        uuid: <!-- UUID of instance
      replica:
        status: <available / unreachable / missing>
        uri: <!-- URI of replica used for slave requests
        uuid: <!-- UUID of instance
      uuid: <!-- UUID of replica set
    <replica set UUID>: ...
    ...
  status: <!-- status of router
  bucket:
    known: <!-- number of buckets with the known destination
    unknown: <!-- number of other buckets
  alerts: [<alert code>, <alert description>], ...
Status list
Code Critical level Description
0 Green The router works in a regular way.
1 Yellow Some replicas sre unreachable (affects the speed of executing read requests).
2 Orange Service is degraded for changing data.
3 Red Service is degraded for reading data.
Potential issues

Note

Depending on the nature of the issue, use either the UUID of a replica, or the UUID of a replica set.

  • MISSING_MASTER — The master in one or multiple replica sets is not specified in the configuration.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Specify the master in the configuration.

  • UNREACHABLE_MASTER — The router lost connection with the master of one or multiple replica sets.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Restore connection with the master. First, check if the master is enabled. If it is, consider checking the network.

  • SUBOPTIMAL_REPLICA — There is a replica for read-only requests, but this replica is not optimal according to the configured weights. This means that the optimal replica is unreachable.

    Critical level: Yellow.

    Cluster condition: Read-only requests are forwarded to a backup replica.

    Solution: Check the status of the optimal replica and its network connection.

  • UNREACHABLE_REPLICASET — A replica set is unreachable for both read-only and data-change requests.

    Critical Level: Red.

    Cluster condition: Partial degrade for read-only and data-change requests.

    Solution: The replica set has an unreachable master and replica. Check the error message to detect this replica set. Then fix the issue in the same way as for UNREACHABLE_REPLICA.

Disaster recovery

Please see the disaster recovery section in the Tarantool manual.

Backups

Please see the backups section in the Tarantool manual.

Troubleshooting

First of all, see a similar guide in the Tarantool manual. Below you can find other Cartridge-specific problems considered.

Editing clusterwide configuration in WebUI returns an error

Examples:

  • NetboxConnectError: "localhost:3302": Connection refused;
  • Prepare2pcError: Instance state is OperationError, can't apply config in this state.

The root problem: all cluster instances are equal, and all of them store a copy of clusterwide configuration, which must be the same. If an instance degrades (can’t accept new configuration) – the quorum is lost. This prevents further configuration modifications to avoid inconsistency.

But sometimes inconsistency is needed to repair the system, at least partially and temporarily. It can be achieved by disabling degraded instances.

Solution:

  1. Connect to the console of the alive instance.

    tarantoolctl connect user:password@instance_advertise_uri
    
  2. Inspect what’s going on.

    cartridge = require('cartridge')
    report = {}
    for _, srv in pairs(cartridge.admin_get_servers()) do
        report[srv.uuid] = {uri = srv.uri, status = srv.status, message = srv.message}
    end
    return report
    
  3. If you’re ready to proceed, run the following snippet. It’ll disable all instances which are not healthy. After that, you can use the WebUI as usual.

    disable_list = {}
    for uuid, srv in pairs(report) do
        if srv.status ~= 'healthy' then
           table.insert(disable_list, uuid)
        end
    end
    return cartridge.admin_disable_servers(disable_list)
    
  4. When it’s necessary to bring disabled instances back, re-enable them in a similar manner:

    cartridge = require('cartridge')
    enable_list = {}
    for _, srv in pairs(cartridge.admin_get_servers()) do
        if srv.disabled then
           table.insert(enable_list, srv.uuid)
        end
    end
    return cartridge.admin_enable_servers(enable_list)
    

An instance is stuck in the ConnectingFullmesh state upon restart

Example:

_images/stuck-connecting-fullmesh.png

The root problem: after restart, the instance tries to connect to all its replicas and remains in the ConnectingFullmesh state until it succeeds. If it can’t (due to replica URI unavailability or for any other reason) – it’s stuck forever.

Solution:

Set the replication_connect_quorum option to zero. It may be accomplished in two ways:

  • By restarting it with the corresponding option set (in environment variables or in the instance configuration file);

  • Or without restart – by running the following one-liner:

    echo "box.cfg({replication_connect_quorum = 0})" | tarantoolctl connect <advertise_uri>
    

I want to run an instance with a new advertise_uri

The root problem: advertise_uri parameter is persisted in the clusterwide configuration. Even if it changes upon restart, the rest of the cluster keeps using the old one, and the cluster may behave in an odd way.

Solution:

The clusterwide configuration should be updated.

  1. Make sure all instances are running and not stuck in the ConnectingFullmesh state (see above).

  2. Make sure all instances have discovered each other (i.e. they look healthy in the WebUI).

  3. Run the following snippet in the Tarantool console. It’ll prepare a patch for the clusterwide configuration.

    cartridge = require('cartridge')
    members = require('membership').members()
    
    edit_list = {}
    changelog = {}
    for _, srv in pairs(cartridge.admin_get_servers()) do
        for _, m in pairs(members) do
            if m.status == 'alive'
            and m.payload.uuid == srv.uuid
            and m.uri ~= srv.uri
            then
                table.insert(edit_list, {uuid = srv.uuid, uri = m.uri})
                table.insert(changelog, string.format('%s -> %s (%s)', srv.uri, m.uri, m.payload.alias))
                break
            end
        end
    end
    return changelog
    

    As a result you’ll see a brief summary like the following one:

    localhost:3301> return changelog
    ---
    - - localhost:13301 -> localhost:3301 (srv-1)
      - localhost:13302 -> localhost:3302 (srv-2)
      - localhost:13303 -> localhost:3303 (srv-3)
      - localhost:13304 -> localhost:3304 (srv-4)
      - localhost:13305 -> localhost:3305 (srv-5)
    ...
    
  4. Finally, apply the patch:

    cartridge.admin_edit_topology({servers = edit_list})
    

The cluster is doomed, I’ve edited the config manually. How do I reload it?

Warning

Please be aware that it’s quite risky and you know what you’re doing. There’s some useful information about clusterwide configuration anatomy and “normal” management API.

But if you’re still determined to reload the configuration manually, you can do (in the Tarantool console):

-- load config from filesystem
clusterwide_config = require('cartridge.clusterwide-config')
cfg = clusterwidie_config.load('./config')
cfg:lock()

confapplier = require('cartridge.confapplier')
confapplier.apply_config(cfg)

This snippet reloads the configuration on a single instance. All other instances continue operating as before.

Note

If further configuration modifications are made with a two-phase commit (e.g. via the WebUI or with the Lua API), the active configuration of an active instance will be spread across the cluster.

Table of contents

Module cartridge

Tarantool framework for distributed applications development.

Cartridge provides you a simple way to manage distributed applications operations. The cluster consists of several Tarantool instances acting in concert. Cartridge does not care about how the instances start, it only cares about the configuration of already running processes.

Cartridge automates vshard and replication configuration, simplifies custom configuration and administrative tasks.

Functions
cfg (opts, box_opts)

Initialize the cartridge module.

After this call, you can operate the instance via Tarantool console. Notice that this call does not initialize the database - box.cfg is not called yet. Do not try to call box.cfg yourself: cartridge will do it when it is time.

Both cartridge.cfg and box.cfg options can be configured with command-line arguments or environment variables.

Parameters:

  • opts: Available options are:
    • workdir: (optional string) a directory where all data will be stored: snapshots, wal logs and cartridge config file.(default: “.”, overridden byenv TARANTOOL_WORKDIR ,args --workdir )
    • advertise_uri: (optional string) either “<HOST>:<PORT>” or “<HOST>:” or “<PORT>”.Used by other instances to connect to the current one.When <HOST> isn’t specified, it’s detected as the only non-local IP address.If there is more than one IP address available - defaults to “localhost”.When <PORT> isn’t specified, it’s derived as follows:If the TARANTOOL_INSTANCE_NAME has numeric suffix _<N>, then <PORT> = 3300+<N>.Otherwise default <PORT> = 3301 is used.
    • cluster_cookie: (optional string) secret used to separate unrelated applications, whichprevents them from seeing each other during broadcasts.Also used as admin password in HTTP and binary connections and forencrypting internal communications.Allowed symbols are [a-zA-Z0-9_.~-] .(default: “secret-cluster-cookie”, overridden byenv TARANTOOL_CLUSTER_COOKIE ,args --cluster-cookie )
    • bucket_count: (optional number) bucket count for vshard cluster. See vshard doc for more details.(default: 30000, overridden byenv TARANTOOL_BUCKET_COUNT ,args --bucket-count )
    • vshard_groups: (optional {[string]=VshardGroup,…}) vshard storage groups, table keys used as names
    • http_enabled: (optional boolean) whether http server should be started(default: true, overridden byenv TARANTOOL_HTTP_ENABLED,args –http-enabled)
    • http_port: (string or number) port to open administrative UI and API on(default: 8081, derived from`TARANTOOL_INSTANCE_NAME`,overridden byenv TARANTOOL_HTTP_PORT,args –http-port)
    • alias: (optional string) human-readable instance name that will be available in administrative UI(default: argparse instance name, overridden byenv TARANTOOL_ALIAS,args –alias)
    • roles: (table) list of user-defined roles that will be availableto enable on the instance_uuid
    • auth_enabled: (optional boolean) toggle authentication in administrative UI and API(default: false)
    • auth_backend_name: (optional string) user-provided set of callbacks related to authentication
    • console_sock: (optional string) Socket to start console listening on.(default: nil, overridden byenv TARANTOOL_CONSOLE_SOCK ,args --console-sock )
    • webui_blacklist: (optional {string,…}) List of pages to be hidden in WebUI.(Added in v2.0.1-54, default: {} )
    • upgrade_schema: (optional boolean) Run schema upgrade on the leader instance.(Added in v2.0.2-3,default: false , overridden byenv TARANTOOL_UPGRADE_SCHEMA args --upgrade-schema )
  • box_opts: (optional table) tarantool extra box.cfg options (e.g. memtx_memory),that may require additional tuning

Returns:

true

Or

(nil)

(table) Error description

is_healthy ()

Check the cluster health. It is healthy if all instances are healthy.

The function is designed mostly for testing purposes.

Returns:

(boolean) true / false

Tables
VshardGroup

Vshard storage group configuration.

Every vshard storage must be assigned to a group.

Fields:

  • bucket_count: (number) Bucket count for the storage group.
Global functions
_G.cartridge_get_schema ()

Get clusterwide DDL schema.

(Added in v1.2.0-28)

Returns:

(string) Schema in YAML format

Or

(nil)

(table) Error description

_G.cartridge_set_schema (schema)

Apply clusterwide DDL schema.

(Added in v1.2.0-28)

Parameters:

  • schema: (string) in YAML format

Returns:

(string) The same new schema

Or

(nil)

(table) Error description

Clusterwide DDL schema
get_schema ()

Get clusterwide DDL schema. It’s like _G.cartridge_get_schema, but isn’t non-global variable.

(Added in v2.0.1-54)

Returns:

(string) Schema in YAML format

Or

(nil)

(table) Error description

set_schema (schema)

Apply clusterwide DDL schema. It’s like _G.cartridge_set_schema, but isn’t non-global variable.

(Added in v2.0.1-54)

Parameters:

  • schema: (string) in YAML format

Returns:

(string) The same new schema

Or

(nil)

(table) Error description

Cluster administration
ServerInfo

Instance general information.

Fields:

  • alias: (string) Human-readable instance name.
  • uri: (string)
  • uuid: (string)
  • disabled: (boolean)
  • status: (string) Instance health.
  • message: (string) Auxilary health status.
  • replicaset: (ReplicasetInfo) Circular reference to a replicaset.
  • priority: (number) Leadership priority for automatic failover.
  • clock_delta: (number) Difference between remote clock and the current one (inseconds), obtained from the membership module (SWIM protocol).Positive values mean remote clock are ahead of local, and viceversa.
ReplicasetInfo

Replicaset general information.

Fields:

  • uuid: (string) The replicaset UUID.
  • roles: ({string,…}) Roles enabled on the replicaset.
  • status: (string) Replicaset health.
  • master: (ServerInfo) Replicaset leader according to configuration.
  • active_master: (ServerInfo) Active leader.
  • weight: (number) Vshard replicaset weight.Matters only if vshard-storage role is enabled.
  • vshard_group: (string) Name of vshard group the replicaset belongs to.
  • all_rw: (boolean) A flag indicating that all servers in the replicaset should be read-write.
  • alias: (string) Human-readable replicaset name.
  • servers: ({ServerInfo,…}) Circular reference to all instances in the replicaset.
admin_get_servers ([uuid])

Get servers list. Optionally filter out the server with the given uuid.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

admin_get_replicasets ([uuid])

Get replicasets list. Optionally filter out the replicaset with given uuid.

Parameters:

Returns:

({ReplicasetInfo,…})

Or

(nil)

(table) Error description

admin_probe_server (uri)

Discover an instance.

Parameters:

admin_enable_servers (uuids)

Enable nodes after they were disabled.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

admin_disable_servers (uuids)

Temporarily diable nodes.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

admin_bootstrap_vshard ()

Call vshard.router.bootstrap() . This function distributes all buckets across the replica sets.

Returns:

(boolean) true

Or

(nil)

(table) Error description

Automatic failover management
FailoverParams

Failover parameters.

(Added in v2.0.2-2)

Fields:

  • mode: (string) Supported modes are “disabled”, “eventual” and “stateful”
  • state_provider: (optional string) Supported state providers are “tarantool” and “etcd2”.
  • tarantool_params: (added in v2.0.2-2)
  • etcd2_params: (added in v2.1.2-26)
    • endpoints: (optional table) URIs that are used to discover and to access etcd cluster instances.(default: {'http://localhost:2379', 'http://localhost:4001'} )
    • username: (optional string) (default: “”)
    • password: (optional string) (default: “”)
  • lock_delay: (optional number) Timeout (in seconds), determines lock’s time-to-live (default: 10)
failover_get_params ()

Get failover configuration.

(Added in v2.0.2-2)

Returns:

(FailoverParams)

failover_set_params (opts)

Configure automatic failover.

(Added in v2.0.2-2)

Parameters:

  • opts:
    • mode: (optional string)
    • state_provider: (optional string)
    • tarantool_params: (optional table)
    • etcd2_params: (optional table) (added in v2.1.2-26)

Returns:

(boolean) true if config applied successfully

Or

(nil)

(table) Error description

failover_promote (replicaset_uuid[, opts])

Promote leaders in replicasets.

Parameters:

  • replicaset_uuid: (table) ] = leader_uuid }
  • opts:
    • force_inconsistency: (optional boolean) (default: false)

Returns:

(boolean) true On success

Or

(nil)

(table) Error description

admin_get_failover ()

Get current failover state.

(Deprecated since v2.0.2-2)

admin_enable_failover ()

Enable failover. (Deprecated since v2.0.1-95 in favor of cartridge.failover_set_params)

admin_disable_failover ()

Disable failover. (Deprecated since v2.0.1-95 in favor of cartridge.failover_set_params)

Managing cluster topology
admin_edit_topology (args)

Edit cluster topology. This function can be used for:

  • bootstrapping cluster from scratch
  • joining a server to an existing replicaset
  • creating new replicaset with one or more servers
  • editing uri/labels of servers
  • disabling and expelling servers

(Added in v1.0.0-17)

Parameters:

EditReplicasetParams

Replicatets modifications.

Fields:

EditServerParams

Servers modifications.

Fields:

  • uri: (optional string)
  • uuid: (string)
  • labels: (optional table)
  • disabled: (optional boolean)
  • expelled: (optional boolean) Expelling an instance is permanent and can’t be undone.It’s suitable for situations when the hardware is destroyed,snapshots are lost and there is no hope to bring it back to life.
JoinServerParams

Parameters required for joining a new server.

Fields:

Clusterwide configuration
config_get_readonly ([section_name])

Get a read-only view on the clusterwide configuration.

Returns either conf[section_name] or entire conf . Any attempt to modify the section or its children will raise an error.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

config_get_deepcopy ([section_name])

Get a read-write deep copy of the clusterwide configuration.

Returns either conf[section_name] or entire conf . Changing it has no effect unless it’s used to patch clusterwide configuration.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

config_patch_clusterwide (patch)

Edit the clusterwide configuration. Top-level keys are merged with the current configuration. To remove a top-level section, use patch_clusterwide{key = box.NULL} .

The function uses a two-phase commit algorithm with the following steps:

  1. Patches the current configuration.
  2. Validates topology on the current server.

III. Executes the preparation phase ( prepare_2pc ) on every server excluding expelled and disabled servers.

IV. If any server reports an error, executes the abort phase ( abort_2pc ). All servers prepared so far are rolled back and unlocked.

V. Performs the commit phase ( commit_2pc ). In case the phase fails, an automatic rollback is impossible, the cluster should be repaired manually.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Inter-role interaction
service_get (module_name)

Get a module from registry.

Parameters:

Returns:

(nil)

Or

(table) instance

service_set (module_name, instance)

Put a module into registry or drop it. This function typically doesn’t need to be called explicitly, the cluster automatically sets all the initialized roles.

Parameters:

Returns:

(nil)

Cross-instance calls
rpc_call (role_name, fn_name[, args[, opts]])

Perform a remote procedure call. Find a suitable healthy instance with an enabled role and perform a [ net.box conn:call ]( https://tarantool.io/en/doc/latest/reference/reference_lua/net_box/#net-box-call) on it.

Parameters:

  • role_name: (string)
  • fn_name: (string)
  • args: (table) (optional)
  • opts:
    • prefer_local: (optional boolean) Don’t perform a remote call if possible. When the role is enabledlocally and current instance is healthy the remote netbox call issubstituted with a local Lua function call. When the option isdisabled it never tries to perform call locally and always usesnetbox connection, even to connect self.(default: true)
    • leader_only: (optional boolean) Perform a call only on the replica set leaders.(default: false)
    • uri: (optional string) Force a call to be performed on this particular uri.Disregards member status and opts.prefer_local .Conflicts with opts.leader_only = true .(added in v1.2.0-63)
    • remote_only: (deprecated) Use prefer_local instead.
    • timeout: passed to net.box conn:call options.
    • buffer: passed to net.box conn:call options.
    • on_push: passed to net.box conn:call options.
    • on_push_ctx: passed to net.box conn:call options.

Returns:

conn:call() result

Or

(nil)

(table) Error description

rpc_get_candidates (role_name[, opts])

List instances suitable for performing a remote call.

Parameters:

  • role_name: (string)
  • opts:
    • leader_only: (optional boolean) Filter instances which are leaders now.(default: false)
    • healthy_only: (optional boolean) Filter instances which have membership status healthy.(added in v1.1.0-11, default: true)

Returns:

({string,…}) URIs

Authentication and authorization
http_authorize_request (request)

Authorize an HTTP request.

Get username from cookies or basic HTTP authentication.

(Added in v1.1.0-4)

Parameters:

Returns:

(boolean) Access granted

http_render_response (response)

Render HTTP response.

Inject set-cookie headers into response in order to renew or reset the cookie.

(Added in v1.1.0-4)

Parameters:

Returns:

(table) The same response with cookies injected

http_get_username ()

Get username for the current HTTP session.

(Added in v1.1.0-4)

Returns:

(string or nil)

Deprecated functions
admin_edit_replicaset (args)

Edit replicaset parameters (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

admin_edit_server (args)

Edit an instance (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

admin_join_server (args)

Join an instance to the cluster (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

admin_expel_server (uuid)

Expel an instance (deprecated). Forever.

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.auth

Administrators authentication and authorization.

Local Functions
set_enabled (enabled)

Allow or deny unauthenticated access to the administrator’s page. (Changed in v0.11)

This function affects only the current instance. It can’t be used after the cluster was bootstrapped. To modify clusterwide config use set_params instead.

Parameters:

  • enabled: (boolean)

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_enabled ()

Check if unauthenticated access is forbidden. (Added in v0.7)

Returns:

(boolean) enabled

init ()

Initialize the authentication HTTP API.

Set up login and logout HTTP endpoints.

set_callbacks (callbacks)

Set authentication callbacks.

Parameters:

  • callbacks:
    • add_user: (function)
    • get_user: (function)
    • edit_user: (function)
    • list_users: (function)
    • remove_user: (function)
    • check_password: (function)

Returns:

(boolean) true

get_callbacks ()

Get authentication callbacks.

Returns:

(table) callbacks

Configuration
set_params (opts)

Modify authentication params. (Changed in v0.11)

Can’t be used before the bootstrap. Affects all cluster instances. Triggers cluster.config_patch_clusterwide .

Parameters:

  • opts:
    • enabled: (optional boolean) (Added in v0.11)
    • cookie_max_age: (optional number)
    • cookie_renew_age: (optional number) (Added in v0.11)

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_params ()

Retrieve authentication params.

Returns:

(AuthParams)

AuthParams

Authentication params.

Fields:

  • enabled: (boolean) Wether unauthenticated access is forbidden
  • cookie_max_age: (number) Number of seconds until the authentication cookie expires
  • cookie_renew_age: (number) Update provided cookie if it’s older then this age (in seconds)
Authorizarion
get_session_username ()

Get username for the current HTTP session.

(Added in v1.1.0-4)

Returns:

(string or nil)

authorize_request (request)

Authorize an HTTP request.

Get username from cookies or basic HTTP authentication.

(Added in v1.1.0-4)

Parameters:

Returns:

(boolean) Access granted

render_response (response)

Render HTTP response.

Inject set-cookie headers into response in order to renew or reset the cookie.

(Added in v1.1.0-4)

Parameters:

Returns:

(table) The same response with cookies injected

User management
UserInfo

User information.

Fields:

  • username: (string)
  • fullname: (optional string)
  • email: (optional string)
  • version: (optional number)
add_user (username, password, fullname, email)

Trigger registered add_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo . Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

get_user (username)

Trigger registered get_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo . Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

edit_user (username, password, fullname, email)

Trigger registered edit_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo . Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

list_users ()

Trigger registered list_users callback.

The callback is triggered without any arguments. It must return an array of UserInfo objects.

Returns:

({UserInfo,…})

Or

(nil)

(table) Error description

remove_user (username)

Trigger registered remove_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo , which was removed. Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

Module cartridge.roles

Role management (internal module).

The module consolidates all the role management functions: register_role , some getters, validate_config and apply_config .

The module is almost stateless, it’s only state is a collection of registered roles.

(Added in v1.2.0-20)

Local Functions
get_all_roles ()

List all registered roles.

Hidden and permanent roles are listed too.

Returns:

({string,..})

get_known_roles ()

List registered roles names.

Hidden roles are not listed as well as permanent ones.

Returns:

({string,..})

get_enabled_roles (roles)

Roles to be enabled on the server. This function returns all roles that will be enabled including their dependencies (bot hidden and not) and permanent roles.

Parameters:

Returns:

({[string]=boolean,…})

get_role_dependencies (role_name)

List role dependencies. Including sub-dependencies.

Parameters:

Returns:

({string,..})

validate_config (conf_new, conf_old)

Validate configuration by all roles.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

apply_config (conf, opts, is_master)

Apply the role configuration.

Parameters:

  • conf: (table)
  • opts: (table)
  • is_master: (boolean)

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.issues

Monitor issues across cluster instances.

Cartridge detects the following problems:

Replication:

  • “Replication from … to … isn’t running” - when box.info.replication.upstream == nil ;
  • “Replication from … to … is stopped/orphan/etc. (…)”;
  • “Replication from … to …: high lag” - when upstream.lag > box.cfg.replication_sync_lag ;
  • “Replication from … to …: long idle” - when upstream.idle > box.cfg.replication_timeout ;

Failover:

  • “Can’t obtain failover coordinator (…)”;
  • “There is no active failover coordinator”;
  • “Failover is stuck on …: Error fetching appointments (…)”;
  • “Failover is stuck on …: Failover fiber is dead” - this is likely a bug;
Tables
limits

Thresholds for issuing warnings. All settings are local, not clusterwide. They can be changed with corresponding environment variables ( TARANTOOL_* ) or command-line arguments. See cartridge.argparse module for details.

Fields:

  • fragmentation_threshold_critical: (number) default: 0.9.
  • fragmentation_threshold_warning: (number) default: 0.6.
  • clock_delta_threshold_warning: (number) default: 5.

Module cartridge.argparse

Gather configuration options.

The module tries to read configuration options from multiple sources and then merge them together according to the priority of the source:

  • –<VARNAME> command line arguments
  • TARANTOOL_<VARNAME> environment variables
  • configuration files

You can specify a configuration file using the –cfg <CONFIG_FILE> option or the TARANTOOL_CFG=<CONFIG_FILE> environment variable.

Configuration files are yaml files, divided into sections like the following:

default:
  memtx_memory: 10000000
  some_option: "default value"
myapp.router:
  memtx_memory: 1024000000
  some_option: "router specific value"

Within the configuration file, argparse looks for multiple matching sections:

  • The section named <APP_NAME>.<INSTANCE_NAME> is parsed first. Application name is derived automatically from the rockspec filename in the project directory. Or it can be can be specified manually with the --app-name command line argument or the TARANTOOL_APP_NAME environment variable. Instance name can be specified the same way, either as –instance-name or TARANTOOL_INSTANCE_NAME .
  • The common <APP_NAME> section is parsed next.
  • Finally, the section [default] with global configuration is parsed with the lowest priority.
Functions
parse ()

Parse command line arguments, environment variables, and configuration files.

Returns:

({argname=value,…})

get_opts (filter)

Filter the results of parsing and cast variables to a given type.

From all configuration options gathered by parse , select only those specified in the filter.

For example, running an application as following:

./init.lua --alias router --memtx-memory 100

results in:

parse()            -> {memtx_memory = "100", alias = "router"}
get_cluster_opts() -> {alias = "router"} -- a string
get_box_opts()     -> {memtx_memory = 100} -- a number

Parameters:

  • filter: ({argname=type,…})

Returns:

({argname=value,…})

get_box_opts ()

Shorthand for get_opts(box_opts) .

get_cluster_opts ()

Shorthand for get_opts(cluster_opts) .

Tables
cluster_opts

Common cartridge.cfg options.

Options which are not listed below (like roles ) can’t be modified with argparse and should be configured in code.

Fields:

  • alias: string
  • workdir: string
  • http_port: number
  • http_enabled: boolean
  • advertise_uri: string
  • cluster_cookie: string
  • console_sock: string
  • auth_enabled: boolean
  • bucket_count: number
  • upgrade_schema: boolean
box_opts

Common [box.cfg](https://www.tarantool.io/en/doc/latest/reference/configuration/) tuning options.

Fields:

  • listen: string
  • memtx_memory: number
  • strip_core: boolean
  • memtx_min_tuple_size: number
  • memtx_max_tuple_size: number
  • slab_alloc_factor: number
  • work_dir: string (deprecated)
  • memtx_dir: string
  • wal_dir: string
  • vinyl_dir: string
  • vinyl_memory: number
  • vinyl_cache: number
  • vinyl_max_tuple_size: number
  • vinyl_read_threads: number
  • vinyl_write_threads: number
  • vinyl_timeout: number
  • vinyl_run_count_per_level: number
  • vinyl_run_size_ratio: number
  • vinyl_range_size: number
  • vinyl_page_size: number
  • vinyl_bloom_fpr: number
  • log: string
  • log_nonblock: boolean
  • log_level: number
  • log_format: string
  • io_collect_interval: number
  • readahead: number
  • snap_io_rate_limit: number
  • too_long_threshold: number
  • wal_mode: string
  • rows_per_wal: number
  • wal_max_size: number
  • wal_dir_rescan_delay: number
  • force_recovery: boolean
  • replication: string
  • instance_uuid: string
  • replicaset_uuid: string
  • custom_proc_title: string
  • pid_file: string
  • background: boolean
  • username: string
  • coredump: boolean
  • checkpoint_interval: number
  • checkpoint_wal_threshold: number
  • checkpoint_count: number
  • read_only: boolean
  • hot_standby: boolean
  • worker_pool_threads: number
  • replication_timeout: number
  • replication_sync_lag: number
  • replication_sync_timeout: number
  • replication_connect_timeout: number
  • replication_connect_quorum: number
  • replication_skip_conflict: boolean
  • feedback_enabled: boolean
  • feedback_host: string
  • feedback_interval: number
  • net_msg_max: number

Module cartridge.twophase

Clusterwide configuration propagation two-phase algorithm.

(Added in v1.2.0-19)

Functions
patch_clusterwide (patch)

Edit the clusterwide configuration. Top-level keys are merged with the current configuration. To remove a top-level section, use patch_clusterwide{key = box.NULL} .

The function uses a two-phase commit algorithm with the following steps:

  1. Patches the current configuration.
  2. Validates topology on the current server.

III. Executes the preparation phase ( prepare_2pc ) on every server excluding expelled and disabled servers.

IV. If any server reports an error, executes the abort phase ( abort_2pc ). All servers prepared so far are rolled back and unlocked.

V. Performs the commit phase ( commit_2pc ). In case the phase fails, an automatic rollback is impossible, the cluster should be repaired manually.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_schema ()

Get clusterwide DDL schema.

(Added in v1.2.0-28)

Returns:

(string) Schema in YAML format

Or

(nil)

(table) Error description

set_schema (schema)

Apply clusterwide DDL schema.

(Added in v1.2.0-28)

Parameters:

  • schema: (string) in YAML format

Returns:

(string) The same new schema

Or

(nil)

(table) Error description

on_patch (trigger_new, trigger_old)

Set up trigger for for patch_clusterwide.

It will be executed before new new config applied.

If the parameters are (nil, old_trigger) , then the old trigger is deleted.

The trigger function is called with two argument: - conf_new ( ClusterwideConfig ) - conf_old ( ClusterWideConfig )

It is allowed to modify conf_new , but not conf_old . Return values are ignored. If calling a trigger raises an error, patch_clusterwide returns it as nil, err .

(Added in v2.1.0-4)

Parameters:

  • trigger_new: (function)
  • trigger_old: (function)
Usage:
local function inject_data(conf_new, _)
    local data_yml = yaml.encode({foo = 'bar'})
    conf_new:set_plaintext('data.yml', data_yml)
end)

twophase.on_patch(inject_data) -- set custom patch modifier trigger
twophase.on_patch(nil, inject_data) -- drop trigger
Local Functions
prepare_2pc (data)

Two-phase commit - preparation stage.

Validate the configuration and acquire a lock setting local variable and writing “config.prepare.yml” file. If the validation fails, the lock isn’t acquired and doesn’t have to be aborted.

Parameters:

  • data: (table) clusterwide config content

Returns:

(boolean) true

Or

(nil)

(table) Error description

commit_2pc ()

Two-phase commit - commit stage.

Back up the active configuration, commit changes to filesystem by renaming prepared file, release the lock, and configure roles. If any errors occur, configuration is not rolled back automatically. Any problem encountered during this call has to be solved manually.

Returns:

(boolean) true

Or

(nil)

(table) Error description

abort_2pc ()

Two-phase commit - abort stage.

Release the lock for further commit attempts.

Returns:

(boolean) true

Module cartridge.failover

Gather information regarding instances leadership.

Failover can operate in two modes:

  • In disabled mode the leader is the first server configured in topology.replicasets[].master array.
  • In eventual mode the leader isn’t elected consistently. Instead, every instance in cluster thinks the leader is the first healthy server in replicaset, while instance health is determined according to membership status (the SWIM protocol).
  • In stateful mode leaders appointments are polled from the external storage. (Added in v2.0.2-2)

This module behavior depends on the instance state.

From the very beginning it reports is_rw() == false, is_leader() == false , get_active_leaders() == {} .

The module is configured when the instance enters ConfiguringRoles state for the first time. From that moment it reports actual values according to the mode set in clusterwide config.

(Added in v1.2.0-17)

Functions
get_coordinator ()

Get current stateful failover coordinator

Returns:

(table) coordinator

Or

(nil)

(table) Error description

Local Functions
_get_appointments_disabled_mode ()

Generate appointments according to clusterwide configuration. Used in ‘disabled’ failover mode.

_get_appointments_eventual_mode ()

Generate appointments according to membership status. Used in ‘eventual’ failover mode.

_get_appointments_stateful_mode ()

Get appointments from external storage. Used in ‘stateful’ failover mode.

accept_appointments (replicaset_uuid)

Accept new appointments.

Get appointments wherever they come from and put them into cache. Cached active_leaders table is never modified, but overriden by it’s modified copy (if necessary).

Parameters:

Returns:

(boolean) Whether leadership map has changed

failover_loop ()

Repeatedly fetch new appointments and reconfigure roles.

cfg ()

Initialize the failover module.

get_active_leaders ()

Get map of replicaset leaders.

Returns:

{[replicaset_uuid] = instance_uuid,…}

is_leader ()

Check current instance leadership.

Returns:

(boolean) true / false

is_rw ()

Check current instance writability.

Returns:

(boolean) true / false

is_vclockkeeper ()

Check if current instance has persisted his vclock.

Returns:

(boolean) true / false

consistency_needed ()

Check if current configuration implies consistent switchover.

Returns:

(boolean) true / false

force_inconsistency (replicaset_uuid)

Force inconsistent leader switching. Do it by resetting vclockkepers in state provider.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

wait_consistency (replicaset_uuid)

Wait when promoted instances become vclockkepers.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.topology

Topology validation and filtering.

Functions
cluster_is_healthy ()

Check the cluster health. It is healthy if all instances are healthy.

The function is designed mostly for testing purposes.

Returns:

(boolean) true / false

Local Functions
get_leaders_orded (topology_cfg, replicaset_uuid, new_order)

Get full list of replicaset leaders.

Full list is composed of:

  • New order array
  • Initial order from topology_cfg (with no repetitions)
  • All other servers in the replicaset, sorted by uuid, ascending

Neither topology_cfg nor new_order tables are modified. New order validity is ignored too.

Parameters:

Returns:

({string,…}) array of leaders uuids

validate (topology_new, topology_old)

Validate topology configuration.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

find_server_by_uri (topology_cfg, uri)

Find the server in topology config.

(Added in v1.2.0-17)

Parameters:

Returns:

(nil or string) instance_uuid found

refine_servers_uri (topology_cfg)

Merge servers URIs form topology_cfg with fresh membership status.

This function sustains cartridge operability in case of advertise_uri change. The uri map is composed basing on topology_cfg, but if some of them turns out to be dead, the member with corresponding payload.uuid is searched beyond.

(Added in v2.3.0-7)

Parameters:

Returns:

({[uuid]) = uri} with all servers except expelled ones.

probe_missing_members (servers)

Send UDP ping to servers missing from membership table.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_fullmesh_replication (topology_cfg, replicaset_uuid)

Get replication config to set up full mesh.

(Added in v1.2.0-17)

Parameters:

Returns:

(table)

Module cartridge.clusterwide-config

The abstraction, representing clusterwide configuration.

Clusterwide configuration is more than just a lua table. It’s an object in terms of OOP paradigm.

On filesystem clusterwide config is represented by a file tree.

In Lua it’s represented as an object which holds both plaintext files content and unmarshalled lua tables. Unmarshalling is implicit and performed automatically for the sections with .yml file extension.

To access plaintext content there are two functions: get_plaintext and set_plaintext .

Unmarshalled lua tables are accessed without .yml extension by get_readonly and get_deepcopy . Plaintext serves for accessing unmarshalled representation of corresponding sections.

To avoid ambiguity it’s prohibited to keep both <FILENAME> and <FILENAME>.yml in the configuration. An attempt to do so would result in return nil, err from new() and load(), and an attempt to call get_readonly/deepcopy would raise an error. Nevertheless one can keep any other extensions because they aren’t unmarshalled implicitly.

(Added in v1.2.0-17)

Usage:
tarantool> cfg = ClusterwideConfig.new({
         >     -- two files
         >     ['forex.yml'] = '{EURRUB_TOM: 70.33, USDRUB_TOM: 63.18}',
         >     ['text'] = 'Lorem ipsum dolor sit amet',
         > })
---
...

tarantool> cfg:get_plaintext()
---
- text: Lorem ipsum dolor sit amet
  forex.yml: '{EURRUB_TOM: 70.33, USDRUB_TOM: 63.18}'
...

tarantool> cfg:get_readonly()
---
- forex.yml: '{EURRUB_TOM: 70.33, USDRUB_TOM: 63.18}'
  forex:
    EURRUB_TOM: 70.33
    USDRUB_TOM: 63.18
  text: Lorem ipsum dolor sit amet
...
Functions
new ([data])

Create new object.

Parameters:

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

save (clusterwide_config, filename)

Write configuration to filesystem.

Write atomicity is achieved by splitting it into two phases: 1. Configuration is saved with a random filename in the same directory 2. Temporal filename is renamed to the destination

Parameters:

  • clusterwide_config: (ClusterwideConfig)
  • filename: (string)

Returns:

(boolean) true

Or

(nil)

(table) Error description

load (filename)

Load object from filesystem.

This function handles both old-style single YAML and new-style directory with a file tree.

Parameters:

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

Local Functions
load_from_file (filename)

Load old-style config from YAML file.

Parameters:

  • filename: (string) Filename to load.

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

load_from_dir (path)

Load new-style config from a directory.

Parameters:

  • path: (string) Path to the config.

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

remove (string)

Remove config from filesystem atomically.

The atomicity is achieved by splitting it into two phases: 1. Configuration is saved with a random filename in the same directory 2. Temporal filename is renamed to the destination

Parameters:

  • string: (path) Directory path to remove.

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.rpc

Remote procedure calls between cluster instances.

Functions
get_candidates (role_name[, opts])

List instances suitable for performing a remote call.

Parameters:

  • role_name: (string)
  • opts:
    • leader_only: (optional boolean) Filter instances which are leaders now.(default: false)
    • healthy_only: (optional boolean) Filter instances which have membership status healthy.(added in v1.1.0-11, default: true)

Returns:

({string,…}) URIs

call (role_name, fn_name[, args[, opts]])

Perform a remote procedure call. Find a suitable healthy instance with an enabled role and perform a [ net.box conn:call ]( https://tarantool.io/en/doc/latest/reference/reference_lua/net_box/#net-box-call) on it.

Parameters:

  • role_name: (string)
  • fn_name: (string)
  • args: (table) (optional)
  • opts:
    • prefer_local: (optional boolean) Don’t perform a remote call if possible. When the role is enabledlocally and current instance is healthy the remote netbox call issubstituted with a local Lua function call. When the option isdisabled it never tries to perform call locally and always usesnetbox connection, even to connect self.(default: true)
    • leader_only: (optional boolean) Perform a call only on the replica set leaders.(default: false)
    • uri: (optional string) Force a call to be performed on this particular uri.Disregards member status and opts.prefer_local .Conflicts with opts.leader_only = true .(added in v1.2.0-63)
    • remote_only: (deprecated) Use prefer_local instead.
    • timeout: passed to net.box conn:call options.
    • buffer: passed to net.box conn:call options.
    • on_push: passed to net.box conn:call options.
    • on_push_ctx: passed to net.box conn:call options.

Returns:

conn:call() result

Or

(nil)

(table) Error description

Local Functions
get_connection (role_name[, opts])

Connect to an instance with an enabled role.

Parameters:

  • role_name: (string)
  • opts:
    • prefer_local: (optional boolean)
    • leader_only: (optional boolean)

Returns:

net.box connection

Or

(nil)

(table) Error description

Module cartridge.tar

Handle basic tar format.

<http://www.gnu.org/software/tar/manual/html_node/Standard.html>

While an archive may contain many files, the archive itself is a single ordinary file. Physically, an archive consists of a series of file entries terminated by an end-of-archive entry, which consists of two 512 blocks of zero bytes. A file entry usually describes one of the files in the archive (an archive member), and consists of a file header and the contents of the file. File headers contain file names and statistics, checksum information which tar uses to detect file corruption, and information about file types.

A tar archive file contains a series of blocks. Each block contains exactly 512 (BLOCKSIZE) bytes:

+---------+-------+-------+-------+---------+-------+-----
| header1 | file1 |  ...  |  ...  | header2 | file2 | ...
+---------+-------+-------+-------+---------+-------+-----

All characters in header blocks are represented by using 8-bit characters in the local variant of ASCII. Each field within the structure is contiguous; that is, there is no padding used within the structure. Each character on the archive medium is stored contiguously. Bytes representing the contents of files (after the header block of each file) are not translated in any way and are not constrained to represent characters in any character set. The tar format does not distinguish text files from binary files, and no translation of file contents is performed.

Functions
pack (files)

Create TAR archive.

Parameters:

Returns:

(string) The archive

Or

(nil)

(table) Error description

unpack (tar)

Parse TAR archive.

Only regular files are extracted, directories are ommitted.

Parameters:

Returns:

({string=string}) Extracted files (their names and content)

Or

(nil)

(table) Error description

Module cartridge.pool

Connection pool.

Reuse tarantool net.box connections with ease.

Functions
connect (uri[, opts])

Connect a remote or get cached connection. Connection is established using net.box.connect() .

Parameters:

  • uri: (string)
  • opts:
    • wait_connected: (boolean or number) by default, connection creation is blocked until theconnection is established, but passing wait_connected=false makes it return immediately. Also, passing a timeout makes itwait before returning (e.g. wait_connected=1.5 makes it waitat most 1.5 seconds).
    • connect_timeout: (optional number) (deprecated)Use wait_connected instead
    • user: (deprecated) don’t use it
    • password: (deprecated) don’t use it
    • reconnect_after: (deprecated) don’t use it

Returns:

net.box connection

Or

(nil)

(table) Error description

Local Functions
format_uri (uri)

Enrich URI with credentials. Suitable to connect other cluster instances.

Parameters:

Returns:

(string) username:password@host:port

map_call (fn_name[, args[, opts]])

Perform a remote call to multiple URIs and map results.

(Added in v1.2.0-17)

Parameters:

  • fn_name: (string)
  • args: (table) function arguments (optional)
  • opts:
    • uri_list: ({string,…}) array of URIs for performing remote call
    • timeout: (optional number) passed to net.box conn:call()

Returns:

({URI=value,…}) Call results mapping for every URI.

(table) United error object, gathering errors for every URI that failed.

Module cartridge.confapplier

Configuration management primitives.

Implements the internal state machine which helps to manage cluster operation and protects from invalid state transitions.

Functions
get_active_config ()

Get current ClusterwideConfig object of instance

Returns:

cartridge.clusterwide-config or nil, if instance not bootstrapped.

get_readonly ([section_name])

Get a read-only view on the clusterwide configuration.

Returns either conf[section_name] or entire conf . Any attempt to modify the section or its children will raise an error.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

get_deepcopy ([section_name])

Get a read-write deep copy of the clusterwide configuration.

Returns either conf[section_name] or entire conf . Changing it has no effect unless it’s used to patch clusterwide configuration.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

Local Functions
set_state (state[, err])

Perform state transition.

Parameters:

  • state: (string) New state
  • err: (optional)

Returns:

(nil)

wish_state (state[, timeout])

Make a wish for meeting desired state.

Parameters:

  • state: (string) Desired state.
  • timeout: (number) (optional)

Returns:

(string) Final state, may differ from desired.

validate_config (clusterwide_config_new)

Validate configuration by all roles.

Parameters:

  • clusterwide_config_new: (table)

Returns:

(boolean) true

Or

(nil)

(table) Error description

apply_config (clusterwide_config)

Apply the role configuration.

Parameters:

  • clusterwide_config: (table)

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.test-helpers

Helpers for integration testing.

This module extends luatest.helpers with cartridge-specific classes and helpers.

Fields
Server

Extended luatest.server class to run tarantool instance.

See also:

  • cartridge.test-helpers.server
Cluster

Class to run and manage multiple tarantool instances.

See also:

  • cartridge.test-helpers.cluster
Etcd

Class to run and manage etcd node.

See also:

  • cartridge.test-helpers.etcd

Module cartridge.remote-control

Tarantool remote control server.

Allows to control an instance over TCP by net.box call and eval . The server is designed as a partial replacement for the iproto protocol. It’s most useful when box.cfg wasn’t configured yet.

Other net.box features aren’t supported and will never be.

(Added in v0.10.0-2)

Local Functions
bind (host, port)

Init remote control server.

Bind the port but don’t start serving connections yet.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

accept (credentials)

Start remote control server. To connect the server use regular net.box connection.

Access is restricted to the user with specified credentials, which can be passed as net_box.connect('username:password@host:port') .

Parameters:

unbind ()

Stop the server.

It doesn’t interrupt any existing connections.

drop_connections ()

Explicitly drop all established connections.

Module cartridge.service-registry

Inter-role interaction.

These functions make different roles interact with each other.

The registry stores initialized modules and accesses them within the one and only current instance. For cross-instance access, use the cartridge.rpc module.

Functions
set (module_name, instance)

Put a module into registry or drop it. This function typically doesn’t need to be called explicitly, the cluster automatically sets all the initialized roles.

Parameters:

Returns:

(nil)

get (module_name)

Get a module from registry.

Parameters:

Returns:

(nil)

Or

(table) instance

Module custom-role

User-defined role API.

If you want to implement your own role it must conform this API.

Functions
init (opts)

Role initialization callback. Called when role is enabled on an instance. Caused either by editing topology or instance restart.

Parameters:

  • opts:
    • is_master: (boolean)
stop (opts)

Role shutdown callback. Called when role is disabled on an instance.

Parameters:

  • opts:
    • is_master: (boolean)
validate_config (conf_new, conf_old)

Validate clusterwide configuration callback.

Parameters:

apply_config (conf, opts)

Apply clusterwide configuration callback.

Parameters:

  • conf: (table) Clusterwide configuration
  • opts:
    • is_master: (boolean)
Fields
role_name

Displayed role name. When absent, module name is used instead.

hidden

Hidden role flag. aren’t listed in cartridge.admin_get_replicasets().roles and therefore in WebUI. Hidden roled are supposed to be a dependency for another role.

  • hidden: (boolean)
permanent

Permanent role flag. Permanent roles will be enabled on every instance in cluster. Implies hidden = true .

  • permanent: (boolean)

Module cartridge.lua-api.stat

Administration functions ( box.slab.info related).

Local Functions
get_stat (uri)

Retrieve box.slab.info of a remote server.

Parameters:

Returns:

(table)

Or

(nil)

(table) Error description

Module cartridge.lua-api.boxinfo

Administration functions ( box.info related).

Local Functions
get_info (uri)

Retrieve box.cfg and box.info of a remote server.

Parameters:

Returns:

(table)

Or

(nil)

(table) Error description

Module cartridge.lua-api.get-topology

Administration functions ( get-topology implementation).

Tables
ReplicasetInfo

Replicaset general information.

Fields:

  • uuid: (string) The replicaset UUID.
  • roles: ({string,…}) Roles enabled on the replicaset.
  • status: (string) Replicaset health.
  • master: (ServerInfo) Replicaset leader according to configuration.
  • active_master: (ServerInfo) Active leader.
  • weight: (number) Vshard replicaset weight.Matters only if vshard-storage role is enabled.
  • vshard_group: (string) Name of vshard group the replicaset belongs to.
  • all_rw: (boolean) A flag indicating that all servers in the replicaset should be read-write.
  • alias: (string) Human-readable replicaset name.
  • servers: ({ServerInfo,…}) Circular reference to all instances in the replicaset.
ServerInfo

Instance general information.

Fields:

  • alias: (string) Human-readable instance name.
  • uri: (string)
  • uuid: (string)
  • disabled: (boolean)
  • status: (string) Instance health.
  • message: (string) Auxilary health status.
  • replicaset: (ReplicasetInfo) Circular reference to a replicaset.
  • priority: (number) Leadership priority for automatic failover.
  • clock_delta: (number) Difference between remote clock and the current one (inseconds), obtained from the membership module (SWIM protocol).Positive values mean remote clock are ahead of local, and viceversa.
Local Functions
get_topology ()

Get servers and replicasets lists.

Returns:

({servers={ServerInfo,…},replicasets={ReplicasetInfo,…}})

Or

(nil)

(table) Error description

Module cartridge.lua-api.edit-topology

Administration functions ( edit-topology implementation).

Editing topology
edit_topology (args)

Edit cluster topology. This function can be used for:

  • bootstrapping cluster from scratch
  • joining a server to an existing replicaset
  • creating new replicaset with one or more servers
  • editing uri/labels of servers
  • disabling and expelling servers

(Added in v1.0.0-17)

Parameters:

EditReplicasetParams

Replicatets modifications.

Fields:

JoinServerParams

Parameters required for joining a new server.

Fields:

EditServerParams

Servers modifications.

Fields:

  • uri: (optional string)
  • uuid: (string)
  • labels: (optional table)
  • disabled: (optional boolean)
  • expelled: (optional boolean) Expelling an instance is permanent and can’t be undone.It’s suitable for situations when the hardware is destroyed,snapshots are lost and there is no hope to bring it back to life.

Module cartridge.lua-api.topology

Administration functions (topology related).

Functions
get_servers ([uuid])

Get servers list. Optionally filter out the server with the given uuid.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

get_replicasets ([uuid])

Get replicasets list. Optionally filter out the replicaset with given uuid.

Parameters:

Returns:

({ReplicasetInfo,…})

Or

(nil)

(table) Error description

probe_server (uri)

Discover an instance.

Parameters:

enable_servers (uuids)

Enable nodes after they were disabled.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

disable_servers (uuids)

Temporarily diable nodes.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

Local Functions
get_self ()

Get alias, uri and uuid of current instance.

Returns:

(table)

Module cartridge.lua-api.failover

Administration functions (failover related).

Functions
get_params ()

Get failover configuration.

(Added in v2.0.2-2)

Returns:

(FailoverParams)

set_params (opts)

Configure automatic failover.

(Added in v2.0.2-2)

Parameters:

  • opts:
    • mode: (optional string)
    • state_provider: (optional string)
    • tarantool_params: (optional table)
    • etcd2_params: (optional table) (added in v2.1.2-26)

Returns:

(boolean) true if config applied successfully

Or

(nil)

(table) Error description

get_failover_enabled ()

Get current failover state.

(Deprecated since v2.0.2-2)

set_failover_enabled (enabled)

Enable or disable automatic failover.

(Deprecated since v2.0.2-2)

Parameters:

  • enabled: (boolean)

Returns:

(boolean) New failover state

Or

(nil)

(table) Error description

promote (replicaset_uuid[, opts])

Promote leaders in replicasets.

Parameters:

  • replicaset_uuid: (table) ] = leader_uuid }
  • opts:
    • force_inconsistency: (optional boolean) (default: false)

Returns:

(boolean) true On success

Or

(nil)

(table) Error description

Tables
FailoverParams

Failover parameters.

(Added in v2.0.2-2)

Fields:

  • mode: (string) Supported modes are “disabled”, “eventual” and “stateful”
  • state_provider: (optional string) Supported state providers are “tarantool” and “etcd2”.
  • tarantool_params: (added in v2.0.2-2)
  • etcd2_params: (added in v2.1.2-26)
    • endpoints: (optional table) URIs that are used to discover and to access etcd cluster instances.(default: {'http://localhost:2379', 'http://localhost:4001'} )
    • username: (optional string) (default: “”)
    • password: (optional string) (default: “”)
  • lock_delay: (optional number) Timeout (in seconds), determines lock’s time-to-live (default: 10)

Module cartridge.lua-api.vshard

Administration functions (vshard related).

Functions
bootstrap_vshard ()

Call vshard.router.bootstrap() . This function distributes all buckets across the replica sets.

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.lua-api.deprecated

Administration functions (deprecated).

Deprecated functions
join_server (args)

Join an instance to the cluster (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

edit_server (args)

Edit an instance (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

expel_server (uuid)

Expel an instance (deprecated). Forever.

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

edit_replicaset (args)

Edit replicaset parameters (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Class cartridge.test-helpers.cluster

Class to run and manage multiple tarantool instances.

Functions
Cluster:new (object)

Build cluster object.

Parameters:

  • object:
    • datadir: (string) Data directory for all cluster servers.
    • server_command: (string) Command to run server.
    • cookie: (string) Cluster cookie.
    • base_http_port: (int) Value to calculate server’s http_port. (optional)
    • base_advertise_port: (int) Value to calculate server’s advertise_port. (optional)
    • use_vshard: (bool) bootstrap vshard after server is started. (optional)
    • replicasets: (tab) Replicasets configuration. List of replicaset_config

Returns:

object

Cluster:server (alias)

Find server by alias.

Parameters:

Returns:

cartridge.test-helpers.server

Cluster:apply_topology ()

Execute edit_topology GraphQL request to setup replicasets, apply roles join servers to replicasets.

Cluster:start ()

Bootstraps cluster if it wasn’t bootstrapped before. Otherwise starts servers.

Cluster:stop ()

Stop all servers.

Cluster:join_server (server)

Register running server in the cluster.

Parameters:

  • server: (Server) Server to be registered.
Cluster:wait_until_healthy (server)

Blocks fiber until cartridge.is_healthy() returns true on main_server.

Parameters:

  • server:
Cluster:upload_config (config)

Upload application config, shortcut for cluster.main_server:upload_config(config) .

Parameters:

  • config:

See also:

  • cartridge.test-helpers.server.Server:upload_config
Cluster:download_config ()

Download application config, shortcut for cluster.main_server:download_config() .

See also:

  • cartridge.test-helpers.server.Server:download_config
Cluster:retrying (config, fn[, …])

Keeps calling fn until it returns without error. Throws last error if config.timeout is elapsed.

Parameters:

  • config: (tab) Options for luatest.helpers.retrying .
  • fn: (func) Function to call
  • …: Args to run fn with. (optional)
Tables
cartridge.test-helpers.cluster.replicaset_config

Replicaset config.

Fields:

  • alias: (string) Prefix to generate server alias automatically. (optional)
  • uuid: (string) Replicaset uuid. (optional)
  • roles: ({string}) List of roles for servers in the replicaset.
  • vshard_group: (optional string) Name of vshard group.
  • all_rw: (optional boolan) Make all replicas writable.
  • servers: (table or number) List of objects to build Server s with or.. code-block:: lua number of servers in replicaset.

Class cartridge.test-helpers.server

Extended luatest.Server class to run tarantool instance.

Functions
Server:build_env ()

Generates environment to run process with. The result is merged into os.environ().

Returns:

map

Server:start ()

Start the server.

Server:stop ()

Stop server process.

Server:graphql (request, http_options)

Perform GraphQL request.

Parameters:

  • request:
    • query: (string) grapqhl query
    • variables: (optional table) variables for graphql query
    • raise: (optional boolean) raise if response contains an error(default: true)
  • http_options: (table) passed to http_request options. (optional)

Returns:

(table) parsed response JSON.

Raises:

  • HTTPRequest error
  • GraphQL error
Server:join_cluster (main_server[, options])

Advertise this server to the cluster.

Parameters:

  • main_server: Server to perform GraphQL request on.
  • options:
    • timeout: request timeout
Server:setup_replicaset (config)

Update server’s replicaset config.

Parameters:

  • config:
    • uuid: replicaset uuid
    • roles: list of roles
    • master:
    • weight:
Server:upload_config (config)

Upload application config.

Parameters:

  • config: (string or table) * table will be encoded as yaml and posted to /admin/config.
Server:download_config ()

Download application config.

Methods
cartridge.test-helpers.server:new (object)

Build server object.

Parameters:

  • object:
    • command: (string) Command to start server process.
    • workdir: (string) Value to be passed in TARANTOOL_WORKDIR .
    • chdir: (bool) Path to cwd before starting a process. (optional)
    • env: (tab) Table to pass as env variables to process. (optional)
    • args: (tab) Args to run command with. (optional)
    • http_port: (int) Value to be passed in TARANTOOL_HTTP_PORT and used to perform HTTP requests. (optional)
    • advertise_port: (int) Value to generate TARANTOOL_ADVERTISE_URI and used for net_box connection.
    • net_box_port: (int) Alias for advertise_port . (optional)
    • net_box_credentials: (tab) Override default net_box credentials. (optional)
    • alias: (string) Instance alias.
    • cluster_cookie: (string) Value to be passed in TARANTOOL_CLUSTER_COOKIE and used as default net_box password.
    • instance_uuid: (string) Server identifier. (optional)
    • replicaset_uuid: (string) Replicaset identifier. (optional)

Returns:

input object

Class cartridge.test-helpers.etcd

Class to run and manage etcd node.

Functions
Etcd:new (object)

Build etcd node object.

Parameters:

  • object:
    • name: (string) Human-readable node name.
    • workdir: (string) Path to the data directory.
    • etcd_path: (string) Path to the etcd executable.
    • peer_url: (string) URL to listen on for peer traffic.
    • client_url: (string) URL to listen on for client traffic.
    • env: (tab) Environment variables passed to the process. (optional)
    • args: (tab) Command-line arguments passed to the process. (optional)

Returns:

object

Etcd:start ()

Start the node.

Etcd:stop ()

Stop the node.

Cartridge Command Line Interface

Cartridge-CLI build status on GitLab CI

Installation

  1. Install third-party software:

  2. Install Tarantool 1.10 or higher.

    You can:

  3. [On all platforms except MacOS X] If you built Tarantool from sources, you need to manually set up the Tarantool packages repository:

    curl -L https://tarantool.io/installer.sh | sudo -E bash -s -- --repo-only
    
  4. Install the cartridge-cli package:

    • for CentOS, Fedora, ALT Linux (RPM package):

      sudo yum install cartridge-cli
      
    • for Debian, Ubuntu (DEB package):

      sudo apt-get install cartridge-cli
      
    • for MacOS X (Homebrew formula):

      brew install cartridge-cli
      
  5. Check the installation:

    cartridge version
    

Now you can create and start your first application!

Quick start

To create your first application:

cartridge create --name myapp

Let’s go inside:

cd myapp

Now build the application and start it:

cartridge build
cartridge start

That’s it! Now you can visit http://localhost:8081 and see your application’s Admin Web UI:

https://user-images.githubusercontent.com/11336358/75786427-52820c00-5d76-11ea-93a4-309623bda70f.png

You can find more details in this README document or you can start with the getting started guide.

Command-line completion

Linux

RPM and DEB cartridge-cli packages contain /etc/bash_completion.d/cartridge Bash completion script. To enable completion after cartridge-cli installation start a new shell or source /etc/bash_completion.d/cartridge completion file. Make sure that you have bash completion installed.

To install Zsh completion, say

cartridge gen completion --skip-bash --zsh="${fpath[1]}/_cartridge"

To enable shell completion:

echo "autoload -U compinit; compinit" >> ~/.zshrc
OS X

If you install cartridge-cli from brew, it automatically installs both Bash and Zsh completions.

Usage

For more details, say:

cartridge --help

The following commands are supported:

  • create — create a new application from template;
  • build — build the application for local development and testing;
  • start — start a Tarantool instance(s);
  • stop — stop a Tarantool instance(s);
  • status — get current instance(s) status;
  • log — get logs of instance(s);
  • clean - clean instance(s) files;
  • pack — pack the application into a distributable bundle;
  • repair — patch cluster configuration files.

The following global flags are supported:

  • verbose — verbose mode;
  • debug — debug mode (the same as verbose, but temporary files and directories aren’t removed);
  • quiet — the mode that hides log details during the build process.
An application lifecycle

In a nutshell:

  1. Create an application (e.g. myapp) from template:

    cartridge create --name myapp
    cd ./myapp
    
  2. Build the application for local development and testing:

    cartridge build
    
  3. Run instances locally:

    cartridge start
    cartridge stop
    
  4. Pack the application into a distributable (e.g. into an RPM package):

    cartridge pack rpm
    
Creating an application from template

To create an application from the Cartridge template, say this in any directory:

cartridge create [PATH] [flags]

The following options ([flags]) are supported:

  • --name strin is an application name.
  • --from DIR is a path to the application template (see details below).
  • --template string is a name of application template to be used. Currently only cartridge template is supported.

Application is created in the <path>/<app-name>/ directory.

By default, cartridge template is used. It contains a simple Cartridge application with:

  • one custom role with an HTTP endpoint;
  • sample tests and basic test helpers;
  • files required for development (like .luacheckrc).

If you have git installed, this will also set up a Git repository with the initial commit, tag it with version 0.1.0, and add a .gitignore file to the project root.

Let’s take a closer look at the files inside the <app_name>/ directory:

  • application files:

    • app/roles/custom-role.lua a sample custom role with simple HTTP API; can be enabled as app.roles.custom
    • <app_name>-scm-1.rockspec file where you can specify application dependencies
    • init.lua file which is the entry point for your application
    • stateboard.init.lua file which is the entry point for the application stateboard
  • special files (used to build and pack the application):

    • cartridge.pre-build
    • cartridge.post-build
    • Dockerfile.build.cartridge
    • Dockerfile.cartridge
  • development files:

    • deps.sh script that resolves the dependencies from the .rockspec file and installs test dependencies (like luatest)
    • instances.yml file with instances configuration (used by cartridge start)
    • .cartridge.yml file with Cartridge configuration (used by cartridge start)
    • tmp directory for temporary files (used as a run dir, see .cartridge.yml)
    • .git file necessary for a Git repository
    • .gitignore file where you can specify the files for Git to ignore
    • env.lua file that sets common rock paths so that the application can be started from any directory.
  • test files (with sample tests):

    test
    ├── helper
    │   ├── integration.lua
    │   └── unit.lua
    │   ├── helper.lua
    │   ├── integration
    │   │   └── api_test.lua
    │   └── unit
    │       └── sample_test.lua
    
  • configuration files:

    • .luacheckrc
    • .luacov
    • .editorconfig

You can create your own application template and use it with cartridge create with --from flag.

If template directory is a git repository, the .git/ files would be ignored on instantiating template. In the created application a new git repo is initialized.

Template application shouldn’t contain .rocks directory. To specify application dependencies use rockspec and cartridge.pre-build files.

Filenames and content can contain text templates.

Available variables are:

  • Name — the application name;
  • StateboardName — the application stateboard name (<app-name>-stateboard);
  • Path - an absolute path to the application.

For example:

my-template
├── {{ .Name }}-scm-1.rockspec
└── init.lua
└── stateboard.init.lua
└── test
    └── sample_test.lua

init.lua:

print("Hi, I am {{ .Name }} application")
print("I also have a stateboard named {{ .StateboardName }}")
Building an application

To build your application locally (for local testing), say this in any directory:

cartridge build [PATH] [flags]

This command requires one argument — the path to your application directory (i.e. to the build source). The default path is . (the current directory).

This command runs:

  1. cartridge.pre-build if the pre-build file exists. This builds the application in the [PATH] directory.
  2. tarantoolctl rocks make if the rockspec file exists. This installs all Lua rocks to the [PATH] directory.

During step 1 of the cartridge build command, cartridge builds the application inside the application directory – unlike when building the application as part of the cartridge pack command, when the application is built in a temporary build directory and no build artifacts remain in the application directory.

During step 2 – the key step here – cartridge installs all dependencies specified in the rockspec file (you can find this file within the application directory created from template).

(An advanced alternative would be to specify build logic in the rockspec as cmake commands, like we do it for cartridge.)

If your application depends on closed-source rocks, or if the build should contain rocks from a project added as a submodule, then you need to install all these dependencies before calling tarantoolctl rocks make. You can do it using the file cartridge.pre-build in your application root (again, you can find this file within the application directory created from template). In this file, you can specify all rocks to build (e.g. tarantoolctl rocks make --chdir ./third_party/proj). For details, see special files.

As a result, in the application’s .rocks directory you will get a fully built application that you can start locally from the application’s directory.

Starting/stopping an application locally
start

Now, after the application is built, you can run it locally:

cartridge start [INSTANCE_NAME...] [flags]

where [INSTANCE_NAME...] means that several instances can be specified.

If no INSTANCE_NAME is provided, all the instances from the Cartridge instances configuration file are taken as arguments (see the --cfg option below).

We also need an application name (APP_NAME) to pass it to the instances while started and to define paths to the instance files (for example, <run-dir>/<APP_NAME>.<INSTANCE_NAME>.pid). By default, the APP_NAME is taken from the application rockspec in the current directory, but also it can be defined explicitly via the --name option (see description below).

Options

The following options ([flags]) are supported:

  • --script FILE is the application’s entry point. It should be a relative path to the entry point in the project directory or an absolute path. Defaults to init.lua (or to the value of the “script” parameter in the Cartridge configuration file).
  • --run-dir DIR is the directory where PID and socket files are stored. Defaults to ./tmp/run (or to the value of the “run-dir” parameter in the Cartridge configuration file).
  • --data-dir DIR is the directory where instances’ data is stored. Each instance’s working directory is <data-dir>/<app-name>.<instance-name>. Defaults to ./tmp/data (or to the value of the “data-dir” parameter in the Cartridge configuration file).
  • --log-dir DIR is the directory to store instances logs when running in background. Defaults to ./tmp/log (or to the value of the “log-dir” parameter in the Cartridge configuration file).
  • --cfg FILE is the configuration file for Cartridge instances. Defaults to ./instances.yml (or to the value of the “cfg” parameter in the Cartridge configuration file).
  • --daemonize, -d starts the instance in background. With this option, Tarantool also waits until the application’s main script is finished. For example, it is useful if the init.lua requires time-consuming startup from snapshot, and Tarantool waits for the startup to complete. This is also useful if the application’s main script generates errors, and Tarantool can handle them.
  • --stateboard starts the application stateboard as well as instances. Ignored if --stateboard-only is specified.
  • --stateboard-only starts only the application stateboard. If specified, INSTANCE_NAME... are ignored.
  • --name string defines the application name. By default, it is taken from the application rockspec.
  • --timeout string Time to wait for instance(s) start in background. Can be specified in seconds or in the duration form (72h3m0.5s). Timeout can’t be negative. Timeout 0 means no timeout (wait for instance(s) start forever). The default timeout is 60 seconds (1m0s).
Environment variables

The cartridge start command starts a Tarantool instance with enforced environment variables:

TARANTOOL_APP_NAME="<instance-name>"
TARANTOOL_INSTANCE_NAME="<app-name>"
TARANTOOL_CFG="<cfg>"
TARANTOOL_PID_FILE="<run-dir>/<app-name>.<instance-name>.pid"
TARANTOOL_CONSOLE_SOCK="<run-dir>/<app-name>.<instance-name>.control"
TARANTOOL_WORKDIR="<data-dir>/<app-name>.<instance-name>.control"

When started in background, a notify socket path is passed additionally:

NOTIFY_SOCKET="<data-dir>/<app-name>.<instance-name>.notify"

cartridge.cfg() uses TARANTOOL_APP_NAME and TARANTOOL_INSTANCE_NAME to read the instance’s configuration from the file provided in TARANTOOL_CFG.

Overriding default options

You can override default options for the cartridge command in the ./.cartridge.yml configuration file.

Here is an example of .cartridge.yml:

run-dir: my-run-dir
cfg: my-instances.yml
script: my-init.lua
stop

To stop one or more running instances, say:

cartridge stop [INSTANCE_NAME...] [flags]

By default, SIGTERM is sent to instances.

The following options ([flags]) are supported:

  • -f, --force indicates if instance(s) stop should be forced (sends SIGKILL).

The following options from the start command are supported:

  • --run-dir DIR
  • --cfg FILE
  • --stateboard
  • --stateboard-only

Note

run-dir should be exactly the same as used in the cartridge start command. PID files stored there are used to stop the running instances.

status

To check the current instance status, use the status command:

cartridge status [INSTANCE_NAME...] [flags]

The following options from the start command are supported:

  • --run-dir DIR
  • --cfg FILE
  • --stateboard
  • --stateboard-only
log

To get logs of the instance running in background, use the log command:

cartridge log [INSTANCE_NAME...] [flags]

The following options ([flags]) are supported:

  • -f, --follow outputs appended data as the log grows.
  • -n, --lines int is the number of lines to output (from the end). Defaults to 15.

The following options from the start command are supported:

  • --log-dir DIR
  • --run-dir DIR
  • --cfg FILE
  • --stateboard
  • --stateboard-only
clean

To remove instance(s) files (log, workdir, console socket, PID-file and notify socket), use the clean command:

cartridge clean [INSTANCE_NAME...] [flags]

cartridge clean for running instance(s) causes an error.

The following options from the start command are supported:

  • --log-dir DIR
  • --data-dir DIR
  • --run-dir DIR
  • --cfg FILE
  • --stateboard
  • --stateboard-only
Packing an application

To pack your application, say this in any directory:

cartridge pack TYPE [PATH] [flags]

where:

  • TYPE (required) is the distribution type. Supported types:
  • PATH (optional) is the path to the application directory to pack. Defaults to . (the current directory).

The options ([flags]) are as follows:

  • --name string (common for all distribution types) is the application name. It coincides with the package name and the systemd-service name. The default name comes from the package field in the rockspec file.
  • --version string (common for all distribution types) is the application’s package version. The expected pattern is major.minor.patch[-count][-commit]: if you specify major.minor.patch, it is normalized to major.minor.patch-count. The default version is determined as the result of git describe --tags --long. If the application is not a git repository, you need to set the --version option explicitly.
  • --suffix string (common for all distribution types) is the result file (or image) name suffix.
  • --unit-template string (used for rpm and deb) is the path to the template for the systemd unit file.
  • --instantiated-unit-template string (used for rpm and deb) is the path to the template for the systemd instantiated unit file.
  • --stateboard-unit-template string (used for rpm and deb) is the path to the template for the stateboard systemd unit file.
  • --use-docker (enforced for docker) forces to build the application in Docker.
  • --tag strings (used for docker) is the tag(s) of the Docker image that results from pack docker.
  • --from string (used for docker) is the path to the base Dockerfile of the runtime image. Defaults to Dockerfile.cartridge in the application root.
  • --build-from string (common for all distribution types, used for building in Docker) is the path to the base Dockerfile of the build image. Defaults to Dockerfile.build.cartridge in the application root.
  • --no-cache creates build and runtime images with --no-cache docker flag.
  • --cache-from strings images to consider as cache sources for both build and runtime images. See --cache-from flag for docker build command.
  • --sdk-path string (common for all distribution types, used for building in Docker) is the path to the SDK to be delivered in the result artifact. Alternatively, you can pass the path via the TARANTOOL_SDK_PATH environment variable (this variable is of lower priority).
  • --sdk-local (common for all distribution types, used for building in Docker) is a flag that indicates if the SDK from the local machine should be delivered in the result artifact.

For Tarantool Enterprise, you must specify one (and only one) of the --sdk-local and --sdk-path options.

For rpm, deb, and tgz, we also deliver rocks modules and executables specific for the system where the cartridge pack command is running.

For docker, the resulting runtime image will contain rocks modules and executables specific for the base image (centos:8).

Next, we dive deeper into the packaging process.

Build directory

The first step of the packaging process is to build the application.

By default, application build is done in a temporary directory in ~/.cartridge/tmp/, so the packaging process doesn’t affect the contents of your application directory.

You can specify a custom build directory for your application in the CARTRIDGE_TEMPDIR environment variable. If this directory doesn’t exists, it will be created, used for building the application, and then removed.

If you specify an existing directory in the CARTRIDGE_TEMPDIR environment variable, the CARTRIDGE_TEMPDIR/cartridge.tmp directory will be used for build and then removed. This directory will be cleaned up before building the application.

Distribution directory

For each distribution type, a temporary directory with application source files is created (further on we address it as application directory). This includes 3 stages.

Stage 1. Cleaning up the application directory

On this stage, some files are filtered out of the application directory:

  • First, git clean -X -d -f removes all untracked and ignored files (it works for submodules, too).
  • After that, .rocks and .git directories are removed.

Files permissions are preserved, and the code files owner is set to root:root in the resulting package.

All application files should have at least a+r permissions (a+rx for directories). Otherwise, cartridge pack command raises an error.

Stage 2. Building the application

On this stage, cartridge builds the application in the cleaned up application directory.

Stage 3. Cleaning up the files before packing

On this stage, cartridge runs cartridge.post-build (if it exists) to remove junk files (like node_modules) generated during application build.

See an example in special files.

Repairing a cluster

To repair a running application, you can use the cartridge repair command.

There are several simple rules you need to know before using this command:

  • Rule #1 of repair is: you do not use it if you aren’t sure that it’s exactly what you need.
  • Rule #2: always use --dry-run before running repair.
  • Rule #3: do not hesitate to use the --verbose option.
  • Rule #4: do not use the --force option if you aren’t sure that it’s exactly what you need.

Please, pay attention to the troubleshooting documentation before using repair.

What does repair actually do?

It patches the cluster-wide configuration files of application instances placed on the local machine. Note that it’s not enough to apply new configuration: the configuration should be reloaded by the instance.

repair was created to be used on production (but it still can be used for local development). So, it requires the application name option --name. Moreover, remember that the default data directory is /var/lib/tarantool and the default run directory is /var/run/tarantool (both of them can be rewritten by options).

In default mode, repair walks across all cluster-wide configurations placed in <data-dir>/<app-name>.* directories and patches all found configuration files.

If the --dry-run flag is specified, files aren’t patched, and only a computed configuration diff is shown.

If configuration files are diverged between instances on the local machine, repair raises an error. But you can specify the --force option to patch different versions of configuration independently.

repair can also reload configuration for all instances if the --reload flag is specified (only if the application uses cartridge >= 2.0.0). Configuration will be reloaded for all instances that are placed in the new configuration using console sockets that are placed in the run directory. Make sure that you specified the right run directory when using --reload flag.

cartridge repair [command]

The following repair commands are available (see details below):

  • list-topology - shows the current topology summary;
  • remove-instance - removes an instance from the cluster;
  • set-leader - changes a replica set leader;
  • set-uri - changes an instance’s advertise URI.

All repair commands have these flags:

  • --name (required) is an application name.
  • --data-dir is a directory where the instances’ data is stored (defaults to /var/lib/tarantool).

All commands, except list-topology, have these flags:

  • --run-dir is a directory where PID and socket files are stored (defaults to /var/run/tarantool).
  • --dry-run runs the repair command in the dry-run mode (shows changes but doesn’t apply them).
  • --reload is a flag that enables reloading configuration on instances after the patch.
Repair commands
Topology summary
cartridge repair list-topology [flags]

Takes no arguments. Prints the current topology summary.

Remove instance
cartridge repair remove-instance UUID [flags]

Removes an instance with the specified UUID from cluster. If the specified instance isn’t found, raises an error.

Set leader
cartridge repair set-leader REPLICASET-UUID INSTANCE-UUID [flags]

Sets the specified instance as the leader of the specified replica set. Raises an error if:

  • a replica set or instance with the specified UUID doesn’t exist;
  • the specified instance doesn’t belong to the specified replica set;
  • the specified instance is disabled or expelled.
Set advertise URI
cartridge repair set-uri INSTANCE-UUID URI-TO [flags]

Rewrites the advertise URI for the specified instance. If the specified instance isn’t found or is expelled, raises an error.

TGZ

cartridge pack tgz ./myapp creates a .tgz archive. It contains all files from the distribution directory (i.e. the application source code and rocks modules described in the application rockspec).

The result artifact name is <name>-<version>[-<suffix>].tar.gz.

RPM and DEB

cartridge pack rpm|deb ./myapp creates an RPM or DEB package.

The result artifact name is <name>-<version>[-<suffix>].{rpm,deb}.

Usage example

After package installation you need to specify configuration for instances to start.

For example, if your application is named myapp and you want to start two instances, put the myapp.yml file into the /etc/tarantool/conf.d directory.

myapp:
  cluster_cookie: secret-cookie

myapp.instance-1:
  http_port: 8081
  advertise_uri: localhost:3301

myapp.instance-2:
  http_port: 8082
  advertise_uri: localhost:3302

For more details about instances configuration see the documentation.

Now, start the configured instances:

systemctl start myapp@instance-1
systemctl start myapp@instance-2

If you use stateful failover, you need to start application stateboard.

(Remember that your application should contain stateboard.init.lua in its root.)

Add the myapp-stateboard section to /etc/tarantool/conf.d/myapp.yml:

myapp-stateboard:
  listen: localhost:3310
  password: passwd

Then, start the stateboard service:

systemctl start myapp-stateboard
Package details

The installed package name will be <name> no matter what the artifact name is.

It contains meta information: the package name (which is the application name), and the package version.

If you use an opensource version of Tarantool, the package has a tarantool dependency (version >= <major>.<minor> and < <major+1>, where <major>.<minor> is the version of Tarantool used for packing the application). You should enable the Tarantool repo to allow your package manager install this dependency correctly:

  • for both RPM and DEB:

    curl -L https://tarantool.io/installer.sh | VER=${TARANTOOL_VERSION} bash
    

The package contents is as follows:

  • the contents of the distribution directory, placed in the /usr/share/tarantool/<app-name> directory (for Tarantool Enterprise, this directory also contains tarantool and tarantoolctl binaries);
  • unit files for running the application as a systemd service: /etc/systemd/system/<app-name>.service and /etc/systemd/system/<app-name>@.service;
  • application stateboard unit file: /etc/systemd/system/<app-name>-stateboard.service (will be packed only if the application contains stateboard.init.lua in its root);
  • the file /usr/lib/tmpfiles.d/<app-name>.conf that allows the instance to restart after server restart.

The following directories are created:

  • /etc/tarantool/conf.d/ — directory for instances configuration;
  • /var/lib/tarantool/ — directory to store instances snapshots;
  • /var/run/tarantool/ — directory to store PID-files and console sockets.

See the documentation for details about deploying a Tarantool Cartridge application.

To start the instance-1 instance of the myapp service, say:

systemctl start myapp@instance-1

To start the application stateboard service, say:

systemctl start myapp-stateboard

This instance will look for its configuration across all sections of the YAML file(s) stored in /etc/tarantool/conf.d/*.

Use the options --unit-template, --instantiated-unit-template and --stateboard-unit-template to customize standard unit files.

You may need it first of all for DEB packages, if your build platform is different from the deployment platform. In this case, ExecStartPre may contain an incorrect path to mkdir. As a hotfix, we suggest customizing the unit files.

Example of an instantiated unit file:

[Unit]
Description=Tarantool Cartridge app {{ .Name }}@%i
After=network.target

[Service]
Type=simple
ExecStartPre=/bin/sh -c 'mkdir -p {{ .InstanceWorkDir }}'
ExecStart={{ .Tarantool }} {{ .AppEntrypointPath }}
Restart=on-failure
RestartSec=2
User=tarantool
Group=tarantool

Environment=TARANTOOL_APP_NAME={{ .Name }}
Environment=TARANTOOL_WORKDIR={{ .InstanceWorkDir }}
Environment=TARANTOOL_CFG={{ .ConfPath }}
Environment=TARANTOOL_PID_FILE={{ .InstancePidFile }}
Environment=TARANTOOL_CONSOLE_SOCK={{ .InstanceConsoleSock }}
Environment=TARANTOOL_INSTANCE_NAME=%i

LimitCORE=infinity
# Disable OOM killer
OOMScoreAdjust=-1000
# Increase fd limit for Vinyl
LimitNOFILE=65535

# Systemd waits until all xlogs are recovered
TimeoutStartSec=86400s
# Give a reasonable amount of time to close xlogs
TimeoutStopSec=10s

[Install]
WantedBy=multi-user.target
Alias={{ .Name }}.%i

Supported variables:

  • Name — the application name;
  • StateboardName — the application stateboard name (<app-name>-stateboard);
  • DefaultWorkDir — default instance working directory (/var/lib/tarantool/<app-name>.default);
  • InstanceWorkDir — application instance working directory (/var/lib/tarantool/<app-name>.<instance-name>);
  • StateboardWorkDir — stateboard working directory (/var/lib/tarantool/<app-name>-stateboard);
  • DefaultPidFile — default instance pid file (/var/run/tarantool/<app-name>.default.pid);
  • InstancePidFile — application instance pid file (/var/run/tarantool/<app-name>.<instance-name>.pid);
  • StateboardPidFile — stateboard pid file (/var/run/tarantool/<app-name>-stateboard.pid);
  • DefaultConsoleSock — default instance console socket (/var/run/tarantool/<app-name>.default.control);
  • InstanceConsoleSock — application instance console socket (/var/run/tarantool/<app-name>.<instance-name>.control);
  • StateboardConsoleSock — stateboard console socket (/var/run/tarantool/<app-name>-stateboard.control);
  • ConfPath — path to the application instances config (