Measuring ArangoDB insert performance ArangoDB offers a few APIs to insert documents. First, there is an HTTP API for bulk document imports. This API was already covered in another post. In general, the bulk import API should always be used when the task is to create many documents into a collection at once, as fast as possible. This works well if the documents are known in advance, e.g. when importing data from a file.
ArangoDB 1.1 will come with a new API for batch requests. This batch request API allows clients to send multiple requests to the ArangoDB server inside one multipart HTTP request. The server will then decompose the multipart request into the individual parts and process them as if they were sent individually. The communication layer can sustain up-to 800.000 requests/second – but absolute numbers strongly depend on the number of cores, the type of the requests, network connections and other factors. More important are the relative numbers: Depending on your use-case you can reduce insert/update times by 80%.
As promised in one of the previous posts, here are some performance results that show the effect of different journal sizes for insert, update, delete, and get operations in ArangoDB.
In the last couple of posts, we have been looking at ArangoDB’s insert performance when using individual document insert, delete, and update operations. This time we’ll be looking at batched inserts. To have some reference, we’ll compare the results of ArangoDB to what can be achieved with CouchDB and MongoDB.
To easily conduct bulk insert benchmarks with different NoSQL databases, we wrapped a small benchmark tool in PHP. The tool can be used to measure the time it takes to bulk upload data into MongoDB, CouchDB, and ArangoDB using the databases’ bulk documents APIs.
In a comment to the last post, there was a request to conduct some benchmarks with a mixed workload that does not test insert/delete/update/get operations in isolation but when they work together. To do this, I put together a quick benchmark that inserts 10,000 documents, and after each insert either directly fetches the inserted document (i.e. insert / get), updates the inserted documents and retrieves it (i.e. insert / update / get), or deletes it (i.e. insert / delete) The three cases are alternated deterministically, meaning each case occurs with the same frequency and in the same order. It’s probably still not the best ever test case, but at least it reflects a mixed read and write workload. The document ids used in the test were monotically increasing integers, starting from some base value. That means no random values were used. The test was repeated for 100,000 documents as well. The dataset still fully fits in RAM. The tests were run in the same environment as the previous tests so one can compare them. The results are in line with the results shown in the previous post. Here’s the chart with the results of the 10,000 documents benchmark: And here are the tests result for the 100,000 documents benchmark:
A side-effect of measuring the impact of different journal sizes was that we generated some performance test results for CouchDB, too. They weren’t included in the previous post because it was about journal sizes in ArangoDB, but now we think it’s time to share them.
A while ago we wrote some blog article that explained how ArangoDB uses disk space. That article compared the disk usage of ArangoDB, CouchDB, and MongoDB for loading some particular datasets. In this post, we’ll show in more detail the disk usage of ArangoDB for insert, update, and delete operations. We’ll also compare it to CouchDB for reference.
In the previous post we published some performance results for ArangoDB’s HTTP and networking layer in comparison to that of some popular web servers. We did that benchmark to assess the general performance (and overhead) of the network and HTTP layer in ArangoDB. Using ArangoDB as an application server While HTTP is a good and (relatively) portable mechanism of shipping data between clients and servers, it is only a transport protocol. People will likely be using ArangoDB not only because it supports HTTP, but primarily because it is a database and an application server.
As a follow-up of Jan’s blog post we have extracted some central figures and created this infographic for your reference.