FAQs

What is ArangoDB and for what kind of applications is it designed for?

ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a “general purpose database”, offering all the features you typically need for modern web applications.

ArangoDB is supposed to grow with the application—the project may start as a simple single-server prototype, nothing you couldn’t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB’s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end—this is where Foxx, ArangoDB’s integrated Javascript application framework, comes into play.
The overall idea is: “We want to prevent a deadlock where the team is forced to switch the technology in the middle of the project because it doesn’t meet the requirements any longer.”

Back To Top

How does ArangoDB differ from other NoSQL databases like MongoDB, CouchDB and neo4j?

ArangoDB’s feature scope is driven by the idea to give the developer everything needed to master typical tasks in a web application — in a convenient and technically sophisticated way alike.

From our point of view it’s the combination of features and quality of the product which accounts for ArangoDB: ArangoDB not only handles documents but also graphs.

ArangoDB is extensible via JavaScript and MRuby. Enclosed with ArangoDB you get “Foxx”. Foxx is an integrated application framework ideal for lean back-ends and single page JavaScript applications (SPA).

Multi-collection transactions are useful not only for online banking and e-commerce but they become crucial in any web app in a distributed architecture. Here again, we offer many choices to developers. If transactions are needed, developers can use them. If, on the other hand, the problem requires a higher performance and less transaction-safety, developers are free to ignore multi-collections transactions and to use the standard single-document transactions implemented by most NoSQL databases.

Another unique feature is ArangoDB’s query language AQL — it makes querying powerful and convenient. AQL enables you to describe complex filter conditions and joins in a readable format, much in the same way as SQL.
For simple queries, we offer a simple query-by-example interface and specialised low-level APIs.

Back To Top

Is ArangoDB production ready?

Starting with version 1.0 (spring 2012) ArangoDB was ready to be used in production, it is fully tested and documented.

Back To Top

For which use cases is ArangoDB not the perfect choice?

Though ArangoDB as a universal approach, there are edge cases where we don’t recommend ArangoDB. Actually, ArangoDB doesn’t compete with massively distributed systems like Cassandra with thousands of nodes and many terabytes of data.

Back To Top

What licence does ArangoDB have?

ArangoDB is published under the Apache 2.0 license. This means essentially that you can use it free for non-commercial and commercial use.

Back To Top

What languages can I use to work with ArangoDB?

For the list of programming language specific client libraries have a look at the APIs page.
Your language of choice is not supported? Use the HTTP API. And: we are always happy about new implementations, so if you decide to write something: please let us know and contribute!

Back To Top

Which data models does ArangoDB support?

You can model your data in several ways:

  • in key/value pairs
  • as collections of documents
  • as graphs with nodes, edges, and properties for both

ArangoDB as key value store

Did you ever use Memcache? Then you are already familiar with the concept of a key-value store: A unique key is assigned to a value which is in the simplest form a string (or a string with some structure like a JSON document … you get the idea).

ArangoDB as a document store

In a “document store” the data is encapsulated in text documents. You can roughly compare a document to a row in a table in a relational database though documents are not as rigid. The documents are not required to follow the same schema, e.g. your first document may have the attributes “name” and “hobbies” while your second document only has the “name” attribute. Nevertheless you can easily query all documents for “hobbies”.

Note from the name-hobbies example that you do not have to follow the rules of normalization: in a relational database you would probably create a table “hobbies” and another one for “users”, in a document store you would store it most likely in the same document.

Being schema free does not mean chaos! You can organize your documents into collections: a collection consists of a number of documents, e.g. all documents with user data.

In ArangoDB the documents are encoded in JSON. You can also save binary data base64-encoded.  Unlike in other NoSQL databases, ArangoDB allows to query data across collections (similar to “joins” in SQL).

ArangoDB as a graph database

A graph database uses graph structures with nodes, edges and properties to represent and store data. This means that you can easily model even complex relationships between single documents.

Let’s say you want to implement a feature “people who like product X also like product Y”. You could do that in ArangoDB by creating collections for “people” and “products”, and an additional edges collection to store the relationships between them. Besides linking documents from the other collections, the edge documents can have any properties you like. What you’ll end up with is a so-called property graph, which you can then query.

For querying graphs, ArangoDB offers a few possibilities:

  • traversals in JavaScript, running on the server
  • using graph functions from inside ArangoDB’s query language, AQL
  • using the low level graph REST APIs to access node- or edge-specific data, or modify them
  • from Java: using Gremlin, as there is a Blueprints implementation for ArangoDB

Back To Top

What tools can be used to access data?

You can access data in ArangoDB

  • using the general HTTP REST API via curl/wget, or your browser
  • via the ArangoDB shell (“arangosh”)
  • using a programming language specific client library

ArangoDB comes with a web based user interface and its own HTTP server. Open http://localhost:8529/_admin in your browser and – voilá – there it is.

The ArangoDB-Shell can be invoked after the server has been started with

arangosh

Without arguments it will try to connect the server on port 8529 on localhost.

For the list of programming language specific client libraries check out the API page.

Back To Top

Does ArangoDB support SQL?

ArangoDB does not support SQL. SQL is not well-suited to cover the different data models in ArangoDB. For example, think of nested list structures inside a document, graph traversals etc. There is no way to query such structures in standard SQL, and deviating from standard SQL does not make much sense.
ArangoDB brings its own declarative language called AQL (ArangoDB Query Language). If you are familiar with SQL you will probably feel quickly at home with ArangoDB. For syntax examples see AQL query examples

Back To Top

How do you query ArangoDB?

ArangoDB offers various options for getting data out of the database. It has a REST interface for CRUD operations and also allows “querying by example”. “Querying by example” means that you create a JSON document with the attributes you are looking for. The database returns all documents which look like the “example document”.

Expressing complex queries as JSON documents can become a tedious task—and it’s almost impossible to support joins following this approach. We wanted a convenient and easy-to-learn way to execute even complex queries, not involving any programming as would be necessary in a map/reduce-based approach.

As ArangoDB supports multiple data models including graphs, it was neither sufficient to stick to SQL nor to simply implement UNQL (another query language idea that was around when ArangoDB came out). We ended up with the “ArangoDB query language” (AQL), a declarative language similar to SQL and JSONiq. AQL supports joins, graph queries, list iteration, results filtering, results projection, sorting, variables, grouping and aggregation.

Of course, ArangoDB also offers drivers for all major programming languages. The drivers wrap the mentioned query options following the paradigm of the programming language and/or frameworks like Ruby on Rails.

Back To Top

How fast is ArangoDB?

To quote Jan Lenhardt from CouchDB: “Nosql is not about performance, scaling, dropping ACID or hating SQL—it is about choice. As nosql databases are somewhat different it does not help very much to compare the databases by their throughput and chose the one which is fasted. Instead—the user should carefully think about his overall requirements and weight the different aspects. Massively scalable key/value stores or memory-only systems can archive much higher benchmarks. But your aim is to provide a much more convenient system for a broader range of use-cases—which is fast enough for almost all cases.”

Anyway, we have done a lot of performance tests and are more than happy with the results. ArangoDB 1.3 inserts up to 140,000 documents per second.

Back To Top

What are the server requirements for ArangoDB?

ArangoDB runs on Linux, OS X and Microsoft Windows.
It runs on 32bit and 64bit systems, though using a 32bit system will limit you to using only approximately 2 to 3 GB of data with ArangoDB.
We thus strongly recommend using ArangoDB on a 64bit system and SSD hard disks.

ArangoDB is a “mostly memory” database, which means that it appreciates RAM very much and is most performing when it is not forced to swap data to the hard disk.

So how much RAM do you need? This depends on the size and structure of your data: Your application will access one or many collections (think of collections as denormalized tables for the time being). Once you open a collection the indexes for this collection are created in the RAM and the data is loaded into the RAM using memory-mapped files. If your collections are bigger than your RAM, the operation system will be forced to swap data in and out of the swap space.

Back To Top

What language is ArangoDB written in?

ArangoDB is mainly written in C and C++. It also uses Google’s V8 engine to run JavaScript code on the server-side.

The server actions are written in JavaScript and Ruby (mruby).

Back To Top

Does ArangoDB support transactions?

Starting with version 1.3, ArangoDB provides support for user-definable transactions.

Transactions in ArangoDB are atomic, consistent, isolated, and durable (ACID).

These ACID properties provide the following guarantees:

  • The atomicity priniciple makes transactions either complete in their entirety or have no effect at all.
  • The consistency principle ensures that no constraints or other invariants will be violated during or after any transaction.
  • The isolation property will hide the modifications of a transaction from other transactions until the transaction commits.
  • Finally, the durability* proposition makes sure that operations from transactions that have committed will be made persistent. The amount of transaction durability is configurable in ArangoDB, as is the durability on collection level.

ArangoDB transactions are different from transactions in SQL. In SQL, transactions are started with explicit BEGIN or START TRANSACTION commands.
command. Following any series of data retrieval or modification operations, an SQL transaction is finished with a `COMMIT` command, or rolled back with a `ROLLBACK` command. There may be client/server communication between the start and the commit/rollback of an SQL transaction.

In ArangoDB, a transaction is always a server-side operation, and is executed on the server in one go, without any client interaction. All operations to be executed inside a transaction need to be known by the server when the transaction is started. This is achieved by the user shipping the transaction declaration to the server (or having it stored there already if the transaction is going to be run many times) and executing it there.

Transactions in ArangoDB can span multiple operations, even on multiple collections.

Back To Top

What durability guarantees does ArangoDB offer?

ArangoDB stores all data in collections. Collections consist of memory-mapped datafiles, so all data will be saved to disk.
The way of disk synchronization is configurable though: eventual or immediate.

The choice between eventual or immediate synchronization can be made on a per-collection level, and also on a per operation level:

  • by default, ArangoDB uses the eventual way of synchronization: it will accept any data-modifying operation and return to the caller when the operation system confirms the write operation was successful. That does not guarantee immediate disk synchronisation, though ArangoDB permanently synchronizes data to disk in a background thread. In this setting, there is the possibility of a data loss between the disk write operation and the asynchronous synchronisation.
  • optionally, collections and individual write operations can be configured to be synchronized immediately. They will only return to the caller after a successful disk synchronization. In this setting, there is full durability (at least the operating system confirmed the data was synchronized to disk – as usual there may be subtleties with filesystem and operating system configuration which are outside of ArangoDB’s reach).

From the durability point of view, immediate synchronization is of course better, but it means performing an extra system call for each operation. On systems with slow sync/msync, this might be a big performance penalty. Thus ArangoDB leaves the user the choice. There might also be collections of different importance: for example, a collection that works a cache, with data that can be recalculated when needed can be configured to have lower durability than collections with more important data. In the end, it’s all up to user to decide.

Back To Top

How do shapes work in ArangoDB?

Documents that have similar structure (i.e., that have the same attribute names and attribute types) can share their structural information. The structure (called “shape”) is saved just once, and multiple documents can re-use it by storing just a pointer to their “shape”.
In practice, documents in a collection are likely to be homogenous, and sharing the structure data between multiple documents can greatly reduce disk storage space and memory usage for documents.

Back To Top

Cursors in ArangoDB vs. cursors in MongoDB

Both ArangoDB and MongoDB return data as a cursor after a successful find operation. Yet there is a significant difference between the two databases: Let us assume that you first fetch a large result set from a collection and remove some of the data from the collection afterwards, before you have fully iterated over the cursor.

ArangoDB will fill the cursor with the result of your query and won’t touch the result of it even if you removed the data from the collection in the meantime. MongoDB seems to fetch data into the cursor incrementally so the result set is affected by the change in the collection. Both approaches have their advantages and disadvantages – just make sure that you know how it works.

Back To Top

How does authentication work in ArangoDB?

Activating authentication for the server

The ArangoDB server can be configured to require authentication, or to not require it.

What mode you use the server in is up to you:

  • Running ArangoDB without authentication will allow everyone access to all collections and documents in the database, plus all API functions. This is convenient for development, but would be a security risk in production.
  • To run ArangoDB in production, you would enable the authentication feature of the server. The authentication feature will make the server require authentication for every incoming request. Only requests of authenticated users will then be allowed, and all other requests will be answered with an HTTP 401 error (Unauthorized) by the server.

The server authentication can be activated and deactivated using the option “server.disable-authentication”. The option can be passed to arangod on the command-line or be put in the server’s configuration file (arangod.conf).

For example, to start the server with authentication turned on, use:

--server.disable-authentication false

Managing users

By default, ArangoDB comes with a user “root” that has a password of “” (empty string). Before using ArangoDB in production you might want to either remove this user, change its password, or deactivate it.

You can do so in arangosh, the command-line shell that comes with ArangoDB. Please note that you need to use the arangosh binary and not the browser-based admin interface.

In arangosh, you can issue the following commands to manage users:

/* set up some helper variable */
users = require("org/arangodb/users");

/* create a new user, but will not flush the 
   authentication cache on the server */
users.save("myuser", "mypasswd", true);

/* flush the authentication cache on the server */
users.reload();

If you want to remove the root user, the commands would be:

/* remove root user.
   before doing this, make sure that you can connect to the 
   server with another user. Otherwise you might be locked out! */

users.remove("root");
users.reload()

All user-related commands are listed in detail here:

https://www.arangodb.org/manuals/current/DbaManualAuthentication.html

Using authentication with arangosh and arangoimp

arangosh will by default connect to the server using the “root” user. To use a different user with arangosh, use the –server.username option for arangosh, e.g.:

arangosh --server.username myuser --server.password mypasswd

You will then be prompted to enter the user’s password. The same option is available for arangoimp, the import tool.

You can also specify the user password directly on the command-line, though this might also be a security risk (the password might be stored in the shell history file!).

Back To Top

How can I import data from files into ArangoDB?

The most convenient method to import a lot of data into ArangoDB is to use the arangoimp command-line tool. arangoimp allows you to import data records from a file into an existing database collection.

Let’s assume you want to import user records into an existing collection named “users” on the server.

Importing JSON-encoded data

Let’s further assume the import data at hand is encoded in JSON. We’ll be using these example user records to import:

{ "name" : { "first" : "John", "last" : "Connor" }, "active" : true, "age" : 25, "likes" : [ "swimming"] }
{ "name" : { "first" : "Jim", "last" : "O'Brady" }, "age" : 19, "likes" : [ "hiking", "singing" ] }
{ "name" : { "first" : "Lisa", "last" : "Jones" }, "dob" : "1981-04-09", "likes" : [ "running" ] }

To import these records, all you need to do is to put them into a file (with one line for each record to import) and run the following command:

arangoimp --file "data.json" --type json --collection "users"

This will transfer the data to the server, import the records, and print a status summary.

As the import file already contains the data in JSON format, attribute names and data types are fully preserved. As can be seen in the example data, there is no need for all data records to have the same attribute names or types. Records can be homogenous.

Importing CSV data

arangoimp also offers the possibility to import data from CSV files. This comes handy when the data at hand is in CSV format already and you don’t want to spend time converting them to JSON for the import.

To import data from a CSV file, make sure your file contains the attribute names in the first row. All the following lines in the file will be interpreted as data records and will be imported.

The CSV import requires the data to have a homogenuous structure. All records must have exactly the same amount of columns as there are headers.

The cell values can have different data types though. If a cell does not have any value, it can be left empty in the file. These values will not be imported so the attributes will not “be there” in document created. Values enclosed in quotes will be imported as strings, so to import numeric values, boolean values or the null value, don’t enclose the value into the quotes in your file.

We’ll be using the following import for the CSV import:

"first","name","age","active","dob"
"John","Connor",25,true,
"Jim","O'Brady",19,,
"Lisa","Jones",,,"1981-04-09"

The command line to execute the import then is:

arangoimp --file "data.csv" --type csv --collection "users"

Running the import programatically

arangoimp uses ArangoDB’s HTTP API to perform the actual import, and so can you.

The HTTP API provides the import action at

/_api/import

You need to send an HTTP POST to this URL and put the import data into the request body. The target collection name needs to be specified in the “collection” URL parameter.

Back To Top

How can I contribute?

As in all open source projects ArangoDB thrives on the contribution of the user community. You can help making ArangoDB better in a couple of ways:

  • install and use it and report bugs and difficulties on Github
  • send us patches using Github’s excellent social coding capabilities
  • join the Google group to discuss implementation details and future versions of ArangoDB
  • contribute API client libraries and fancy add-ons

And: We are happy to support students and graduate students interested in writing their bachelor/master thesis or dissertation in the context of alternative databases.

Back To Top

Comments are closed.