Asynchronous Execution – new Feature in Version 1.4

Performance 4 Comments

Measuring ArangoDB insert performance

ArangoDB offers a few APIs to insert documents. First, there is an HTTP API for bulk document imports. This API was already covered in another post. In general, the bulk import API should always be used when the task is to create many documents into a collection at once, as fast as possible. This works well if the documents are known in advance, e.g. when importing data from a file.

More often that not, web applications will insert just one or a few documents at a time (e.g. creating a single user document). ArangoDB provides a separate HTTP API for single document inserts. Applications can use this API to insert a single document and then continue. In a real-world setup, there will often be multiple clients trying to insert documents via this API in parallel.

In this post, we’ll describe several ways to increase the insert performance by modifying parameters on the client-side.

Test setup

This post covers a specific use case: inserting 500K documents into the same ArangoDB collection. We’ll start with just one client and use these values as our baseline figures. After that, we’ll play a bit with client parallelism and other parameters.

For conducting the tests we’ll be using arangob, a binary to execute some load tests with ArangoDB. There might be other more efficient HTTP testing tools around, but arangob is still a choice because it is easy to use and is shipped with ArangoDB. This allows anyone else to reproduce the tests locally.

arangob can execute different test cases, and the one we’ll use is named “document”. This will create documents like this and send them to the server for insertion:

{
  "test1":"some test value",
  "test2":"some test value",
  "test3":"some test value",
  ...
}

How many attributes it creates depends on the –complexity parameter. A value of 10 was used in all the following tests, so each document had ten string attributes. The test collection was re-created and empty at the start of each run.

The arangob client was located on the same physical host as the ArangoDB server. The server was an 8-way 64 bits Linux. A TCP/IP connection was used, and HTTP keep-alive was turned on for the tests (this is also the default in ArangoDB). The ArangoDB server was a 1.4.0-beta2, started with the 1.4 default configuration.

Test results

Baseline figures (serial insertion)

Let’s first establish some baseline figures by inserting documents serially without any parallelism. This test will issue 500K individual HTTP requests in a row. The client will not continue while the server is processing a single request.

unix> arangob                 \
        --concurrency 1       \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 8337.457596
Elapsed time since start: 59.970320 s

This is as bad as it could be. In this test, the insertions are sent serially from the client to the server, meaning 500K HTTP requests. Serial execution on the client also means that ArangoDB’s server-side parallelism cannot be exploited. ArangoDB is multi-threaded and able to handle multiple incoming requests in parallel.

Parallelism

Let’s check what happens if we can parallelise the client. If we try with 8 parallel clients, the results change to:

unix> arangob                 \
        --concurrency 8       \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 34039.341036
Elapsed time since start: 14.688886 s

So we got a 75 % reduction already. Still this doesn’t look too impressive. How can we get better while still inserting the same documents?

Well, the main problem is that we are still sending 500K individual HTTP requests to the server. We could probably increase throughput by using the bulk import API. This would allow us to send thousands of documents in a single HTTP request, greatly reducing the number of requests. But bulk imports are a different use case and applications often work on one or a few items at a time.

However, it is often the case that at least a few operations can be bundled. If the use case allows sending at least a few items at a time, we might still save lots of HTTP requests.

Enter batch requests

If the use case allows sending two documents at a time instead of just one, we can use batch requests. We first use batches without concurrency, so the following figures should be compared to the baseline:

unix> arangob                 \
        --batch-size 2        \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 10293.982554
Elapsed time since start: 48.572066 s

So we got a 19 % reduction execution time reduction by sending two instead of one document. It is reasonable to assume that a lot of use cases allow sending two documents at once. We can even improve further if the use case allows sending a bit bigger batches. Sending four documents at a time will result in a 33 % reduction in execution time:

unix> arangob                 \
        --batch-size 4        \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 12504.588559
Elapsed time since start: 39.985322 s

The results get better with bigger batches, but at some point we would be leaving the scope of the use case, so we’ll not examine this any further.

Asychronous execution

Another parameter that influences throughput is whether the client needs to wait for the server’s response in order to go on. If the client application can continue without waiting for the server’s response (i.e. it does not depend on the response), we can exploit that fact and use asynchronous requests. The server will then directly answer the incoming request with an acknowledgement and process the actual document insertion asynchronously later. This allows the client to go on while the server handles the request in the background.

If we send requests serially (as in baseline) but use asynchronous execution, the results are:

unix> arangob                 \
        --async true          \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 14118.421878
Elapsed time since start: 35.414723 s

40 % reduction in execution time. If your use case allows continuing with waiting for the response to an insert, you should definitely check this out!

Combining parallelism and batching

So far we always modified just one parameter at a time (concurrency, batching, asynchronous execution). Let’s now check what results can be achieved by varying multiple parameters at the same time.

We’ll start with concurrency and batching. We’ll use a client concurrency of 8 and use the smallest possible batch size (2):

unix> arangob                 \
        --batch-size 2        \
        --concurrency 8       \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 49042.297118
Elapsed time since start: 10.195281 s

A 83 % reduction compared to the baseline! If sending four documents at a time, we can improve to:

unix> arangob                 \
        --batch-size 4        \
        --concurrency 8       \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 81930.586113
Elapsed time since start: 6.102727 s

Almost 90 % reduction when compared to the baseline. We could even improve further by making the batches bigger, but that would go beyond the scope of the use case.

Combining parallelism and asynchronous execution

Let’s try concurrent insertion and asynchronous execution together. Using a concurrency of 8:

unix> arangob                 \
        --async true          \
        --concurrency 8       \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 50633.285694
Elapsed time since start: 9.874927 s

Compared to the baseline this means an execution time reduction of 83 %. And the numbers still mean an improvement over using just parallelism alone (14.6 s) or asynchronous requests alone (35.4 s).

Finally, let’s introduce batch requests into the picture. Trying with a batch size of 4:

unix> arangob                 \
        --batch-size 4        \
        --async true          \
        --concurrency 8       \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 56506.120347
Elapsed time since start: 8.848599 s

Just a small improvement here, so it seems we cannot improve much more.

Conclusion

We were able to achieve almost tenfold throughput in the above tests, just by varying client parameters. Nothing was changed on the server-side.

The obvious improvement for a single-threaded client is to make it multi-threaded. This alone already bought us a 75 % reduction in total execution time. If the client cannot be parallelised for whatever reason, there are additional options, such using the asynchronous execution, and using batch requests. Even if the client can be made parallel, it may still benefit further from using batch requests or the asynchronous execution.

It is always worth looking at the existing options. Just to show what kind of insertion rates can be achieved when playing with the above parameter, here’s the test with all the tricks maxed out (note: to run this test you may need to start the ArangoDB server with an adjusted queue size, e.g. –scheduler.maximal-queue-size 64000):

unix> arangob                 \
        --batch-size 128      \
        --async true          \
        --concurrency 8       \
        --requests 500000     \
        --complexity 10       \
        --collection test     \
        --test-case document
...
Operations per second rate: 1180110.883219
Elapsed time since start: 0.423689 s

Obviously this is beyond the use case described in this post. The server may also need a couple of additional seconds to actually complete the insertions from the asynchronous queue, but it shows that clients can put in 500K documents in under half a second if they can exploit all the server features.

About Jan Steemann

Jan is a member of ArangoDB's core development team. He is an expert in data modelling with nosql & relational databases and writing high performance web applications. For ArangoDB, he wrote much of AQL (ArangoDB's query language).
  • Ivan Tugay

    Hi Jan, I’m interested in what was the hardware configuration for these tests. I would appreciate a detailed configuration.

    • jsteemann

      Hi Ivan,

      the configuration used for this test was my desktop PC:

      Linux Kernel 3.7.10-1.16, cfq scheduler
      8x Intel(R) Core(TM) i7 CPU, 2.67 GHz
      12 GB total RAM
      1000Mb/s, full-duplex network connection
      SATA II hard drive (7.200 RPM, 32 MB cache)

      without any optimisations.

      Best regards
      Jan

      • Ivan Tugay

        Thank you, good job! Do you plan to test the cluster?

      • Ivan Tugay

        I think many would have been interesting to compare the performance arangodb vs rethinkdb vs couchbase vs aerospike vs riak:)