Performance with different journal sizes

Performance Leave a comment

As promised in one of the previous posts, here are some performance results that show the effect of different journal sizes for insert, update, delete, and get operations in ArangoDB.

Why journal size could matter

The journal file sizes determine how large a single datafile in ArangoDB is. The smaller that parameter is, the more datafiles need to be created, initially prefilled, closed, compacted etc.. These operations do have some overhead per file, and they occur more often the more datafiles are being used.

The journal size can be configured at startup with the parameter “–database.maximal-journal-size”. It will then affect new collections only. It can also be set on a per-collection level when a collection is created, and will then only affect that particular collection.

The test setup

The results shown in the charts were generated by a test that started a new ArangoDB database instance with a specific journal size (using the –database.maximal-journal-size startup option), and then used the ArangoDB HTTP API to insert a number of documents from a file into the collection in the database. The inserts were done individually and not in batches. That means to insert 10,000 documents, 10,000 individual HTTP requests were executed. These inserts were done with varying concurrency levels from 1 (no concurrency) to 64 clients. The total test time on the client side was recorded.

After the data was inserted, the documents were retrieved individually, again using the HTTP API and with varying concurrency levels. That operation time was recorded as well. Following, all existing documents were updated individually via HTTP calls, and the total update time was recorded. Finally, document deletion time was measured by importing the documents again and then deleting them individually via the HTTP API. Deletion time was recorded.

The ArangoDB version used in these tests was 1.1-alpha. The waitForSync option was turned off so the data inserted was not forced to disk via msync after each insert, but flushed to disk asynchronously.

We used a slightly modified version of httpress as the HTTP test client. ArangoDB and the test client were running on the same physical server. The test machine had the following specs:

  • Linux Kernel 2.6.37.6-0.11, cfq scheduler
  • 64 bit OS
  • 8x Intel(R) Core(TM) i7 CPU, 2.67 GHz
  • 12 GB total RAM
  • SATA II hard drive (7.200 RPM, 32 MB cache)

Performance, 10,000 operations

When working with only 10,000 documents, the effects of using different journal sizes were minimal and well within the margin of error. With only 10,000 operations, there were also a few outliers (deviations of 0.1 seconds) at higher concurrency that happened during some of the tests but that were not reliably reproducible. They shouldn’t be taken too serious given the short overall duration of the tests, and given the test setup (client and server on the same machine, competing for the same resources).
Insert performance
Total execution times were well below one second for all tested concurrency levels when inserting 10,000 documents individually. Different journal sizes did not have any substantial effect here.


Delete performance
The same can be said for delete operations: with only 10,000 documents, the effect of different journal sizes is too small to be relevant. The total execution time is below 0.7 seconds in all cases as delete is a relatively cheap operation.


Update performance
With just 10,000 update operations, the total execution time is again below 1 second in all cases, and there is no clear indication that journal size matters.


Get performance
When retrieving documents, the journal size should not matter at all, provided the OS has buffered the data in RAM already. And indeed, no substantial difference can be observed when executing 10,000 get operations with different journal sizes.

Performance, 100,000 operations

The picture changes slightly when increasing the number of operations. When working with 100,000 documents instead of just 10,000, it can be observed that smaller journal sizes lead to slightly longer execution times.
Insert performance
It can be seen that bigger journal file sizes lead to slightly better execution times, though the difference is not very high. It might well be worth trading in a little performance for some saved disk space in some cases.


Delete performance
When executing 100,000 delete operations, using the default journal size of 32 MB has a slight performance benefit over using smaller journals. The benefit is smaller than what was observed for insert operations, and this can be explained by the fact that delete is a relatively cheap operation that writes only very small amounts of data (basically just a deletion marker).


Update performance
When performing 100,000 individual update operations, we again see that journal size matters a bit, and that smaller journals lead to slightly longer execution times. This is about the same distribution as in the insert operation case.


Get performance
Looking finally at get operations, it can be seen that different journal sizes still do not have any effect on the overall execution time. This is expected as get operations do not need to access the disk if the data is in RAM already. Different journal sizes have no effect at all here.

Conclusion

Different journal sizes in ArangoDB don’t have an effect for data-retrieval operations (i.e. get), but they can have some performance impact for data-modification operations (i.e. insert, delete, update). You may use the journal size parameter to trade in a little data-modification performance for some disk space savings.

Please note that the performance results shown above were measured with some particular datasets. As not all data is created equal, the performance impact may vary with the data. So please be sure to measure with your own datasets.

To conclude: whether or not you should adjust the journal size and for which collections depends on your workload, performance requirements, and available hardware. Please also be sure to check the previous post that shows the disk space usages. Please also note that adjusting the maximum journal size does modify the maximum size of documents that can be saved into a collection. Picking a very low journal size obviously is not a good idea if you plan on saving big documents. But as the journal size can be adjusted on a per-collection level you can still fine-tune the settings according to your needs.

About Jan Steemann

Jan is a member of ArangoDB's core development team. He is an expert in data modelling with nosql & relational databases and writing high performance web applications. For ArangoDB, he wrote much of AQL (ArangoDB's query language).