D_store
overview | tutorial | download | API | misc | faq | license
D_store is an in memory database. A good mental model for d_store is a programmatic spreadsheet; there are rows and rows contain fields with values.
D_store is mostly row oriented, all stores represent either all the rows in the spreadsheet or a subset thereof (or the empty set). The query powers of d_store come from the set operations you can do with stores, like and, or, sub, etc.
Why D_store ?
D_store is good for storing highly related data and performing queries on that data. It can however only scale to the amount of memory available. But, this fits very well for GUI applications that let users interact with a lot of interrelated data; d_store is ment as an enabler of rich applications like iTunes, or addressbooks, or bookmark managers etc.
D_store supports tagging, full text searching, hierarchies, and almost any relation you can define on your data. The API provides very direct access to the data, unlike, for instance SQL engines, but it still has the query power as SQL.
In d_store, all data is stored as values, but every value is stored only once per store, which is good for memory usage. And this is where the second part of the querying power of d_store comes from: every value knows in which rows it appears.
A last powerful concept that d_store has is that everything operates on stores, and the results sets are always stores. So when you get the results of a query, you can query that result again and again, refining the results further and further.
D_store is written in plain C and released under the GPL.
Skinny dipping the API
#include <d_store/d_store.h>
...
// (1)
d_store * store = d_store_create(NULL);
// (2)
d_insert(store, D_INT(1), "name", "test person one", NULL);
d_insert(store, D_INT(1), "tel", "+555323323", NULL);
d_insert(store, D_INT(1), "tel", "+031299288", NULL);
d_insert(store, D_INT(2), "name", test person two", NULL);
// (3)
d_store * all_tel = d_select(store, 2, D_STRING("tel"));
d_store * tel = d_nodup(all_tel, 1);
d_store_free(all_tel);
// (4)
d_row * row;
d_row * name;
for (row = d_first(tel); row; row = d_next(tel)) {
name = d_select_first(store, 1, d_row_get(row, 1));
printf("person: %s\n",
D_TOSTRING(d_row_get(name, 3)));
}
d_store_free(tel);
// (5)
d_store_free(store);
...
- We create a
d_storewith a new database. Passing an existingd_storetod_store_create()creates ad_storefor the same database, passing inNULLcreates a new database. - Then we insert some data in the database. All variable length functions will automatically convert strings to
d_values, except for the empty string (""). However, it is good practice to use theD_STRINGmarco anyhow, to created_valuesthat contain strings. - We create a sub store which contains all telephone numbers, using
d_select. As one person can have more the one telephone number, we remove all instances where we have the same person twice, using thenodupoperation. As we no longer need theall_telstore, so we free it usingd_store_free(). - We iterate the
telstore usingd_firstandd_nextas iterators. Inside the loop we fetch thed_rowrepresenting the name to which the telephone number belongs, and we print it. Notice we don't free any values or rows. We do free thetelstore when we are done with it. - We free the last last
d_storebelonging to a database, will release all memory associated with the database.
We never freed any value we gave to the database, this is because the macros D_INT and D_STRING create new d_values using d_value_from_*_oneshot() which tells d_store that this value is under his control, and can be used as it sees fit.
Also we never freed any d_row or d_value we get from the database, because these keep being under the d_store's control.
As you can see, d_store tries to be a really friendly API, but also a very direct API. See API for more.
Status and Roadmap
D_store is currently not finished, so you should regard it as beta. The 0.2 release is however a stable release, and finished enough to be useful.
The plan is that by publishing d_store now, I can get some feedback on the API. If needed, any next 0.3 release will change and enhance the API. In the mean time, the 0.2 release will be kept stable and bug-fixed.
Because d_store is far from finished, during 0.2 and 0.3 releases, I'll be working on the unstable 0.4 and further releases, which should bring the following, in the order they will likely be implemented.
- A journalled file based backend, for persistant stores.
- Rollbacks and commits, or at least at a level where apps can implement undo easily.
- full error management, API detects and reports errors
-
Speed and memory optimizations; this was an anti-goal of current releases, but when API and features are stable, there are a lot of optimizations to do. Some ideas:
- Rows can be represented by bitmaps, instead of b-trees
- The data structures should use bit flags, and can be compacted
- Using values that are not indexed (and cannot be selected, are a user optimisation)
- Pre-allocating values, and b-tree elements in bulk to relieve malloc and free and prevent fragmentation
- lazy stores: d_store_create() should not copy all rows from master, and maybe stores can represent their calculation, but only execute those when needed
- Multi process (or even machine) distributed stores, the atomic values and row based-ness could be a nice model for this, also inter thread might be implemented like this?
- Callback API for stores, to 'listen' for row changes or row deletes before and after they have happend
Only after a file based backend and error management can the API be considered to approach a 1.0 release. But as I'm doing this in my free time, it might take some time before any of this is implemented.
The 0.2 release is however stable and has all the features needed to be used.
Some numbers
A sample application (maemo-todo) with a database of 5000 todo elements of which around 1/3 are tagged with multiple tags, takes just above 2Mb of memory for d_store. And it works just fine on a Nokia 770. (Except for a bug in gtk treeview, that always draws all rows always, even if they are invisible, it works fine with gtk 2.10.)
Scaling up to 250.000 (quarter million) rows will consume just below 100Mb of RAM in the todo application example. However I've only tried it with a 130.000 rows, eating just below 50Mb but the app stays fully functional, including the searching or the tagging. (On a desktop PC.)
So depending on what your application will store per row, and how interrelated your data is (the more related, the better) and what kind of environment you target. Storing over 10.000 rows could still work on a Nokia 770 (The device has 64 Mb or ram) and storing until 200.000 rows should work just fine on a normal desktop PC.
In the current design, I would guess, its performance will degrade with over half a million rows, however, I'm confident that a 1.0 release should support a million rows still.
But this is just me doing some wet finger estimations, I can be way off.
Last modified: 2007-11-19 20:15 GMT