<< Previous | Next >>

SQL vs. Tutorial D vs. D_store

Is d_store really a database? Most people associate SQL with a database, and nothing else. Well, luckily there is more under the sun. SQL is very pragmatic, it does its job, but that is all. Unfortunately, if you want a big ACID Relational database server, SQL is the only way.

There is also Tutorial D (no wonder it never gets anywhere, with such a name). Which is way more powerful in expressing itself. The one single largest advantage Tutorial D has over SQL, is that you can continue and continue with your query, without getting lost in subqueries or unions or variable renaming. Tutorial D is a truly relational query language. (SQL is a Structured Query Language, they never dared to attach Relational to its name.)

And then there is d_store, which by no means tries to be an ACID RDBMS, instead it has a somewhat more specific domain. But it can still do all the queries SQL or Tutorial D can. So lets see.

(To lean more about Tutorial D, see Course on SQL and Tutorial D or The Third Manifesto.)

I'm taking three examples from this SQL vs Tutorial D presentation, focused on semantics.

We have an 'emp' table with fields: (employee_id, department_id, salary, bonus).
We have an 'works_on' table with fields: (employee_id, project_id).

Exercise nr 1, get all high paying employees.

SQL:

SELECT employee_id, salary + bonus as total_pay WHERE salary + bonus > 5000;

Tutorial D:

((EXTEND emp ADD salary + bonus AS total_pay)
  WHERE total_pay > 5000) { employee_id, total_pay }

d_store:

int total_pay(d_row * row) {
    return D_TOINT(d_row_get(row, 3)) + D_TOINT(d_row_get(row, 4));
}

void exercise1(d_store * emp) {
    d_store * result = d_store_create_empty(emp);

    d_row * row;
    for (row = d_first(emp); row; row = d_next(emp)) {
        if (total_pay(row) > 5000) {
            d_store_add_row(result, row);
        }
    }
}

Exercise nr 2, get all big departments.

SQL:

SELECT department_id, COUNT(*) as nr FROM emp 
  GROUP BY department_id HAVING nr > 100;

Tutorial D:

( SUMMARIZE emp PER emp { department_id } ADD COUNT ( ) AS nr ) WHERE nr > 100;

d_store:

void exercise2(d_store * emp) {
    d_store * summary = d_store_create(NULL);

    d_row * row;
    for (row = d_first(emp); row; row = d_next(emp)) {
        d_row * dep = d_select_first(summary, 1, d_row_get(row, 2));
        if (dep) {
            d_row_set(dep, 2, D_INT(D_TOINT(d_row_get(dep, 2)) + 1));
        } else {
            d_insert(summary, d_row_get(row, 2), D_INT(1), NULL);
        }
    }

    for (row = d_first(summary); row; row = d_next(summary)) {
        if (! (D_TOINT(d_row_get(row, 2)) > 100)) {
            d_row_delete(row);
        }
    }
}

Can you see the SQL HAVING in d_store code? You aggregate and then filter. In Tutorial D, results sets are again full table entities (just like d_store, at some level), so you can query them again, that is why in Tutorial D HAVING is not in the language, a normal WHERE suffices.

Exercise nr 4, get employees working together

SQL:

SELECT w1.employee_id, w2.employee_id, w1.project_id
  FROM work_on w1, works_on w2
WHERE
  w1.project_id = w2.project_id AND w1.employee_id > w2.employee_id;

(if we used w1.employee_id <> w2.employee_id we'd get the tuples twice, once in reverse, so a smart trick using > .)

Tutorial D:

( (works_on RENAME employee_id AS e1) JOIN
  (works_on RENAME employee_id AS e2) WHERE e1 > e2 )

(JOINs in Tutorial D are only natural join, so the resulting set is { project_id, e1, e2 })

d_store:

void exercise4(d_store * works_on) {
    d_store * result = d_store_create(NULL);

    d_row * row;
    for (row = d_first(works_on); row; row = d_next(works_on)) {
        //everybody working on this project
        d_store * emp_working = d_select(works_on, 2, d_row_get(row, 2));

        d_row * emp;
        for (emp = d_first(emp_working); emp; emp = d_next(emp_working)) {

            if (D_TOINT(d_row_get(emp, 1)) > D_TOINT(d_row_get(row, 1))) {
                d_insert(result, d_row_get(row, 2),
                        d_row_get(row, 1), d_row_get(emp, 1), NULL);
            }
        }
        d_store_free(emp_working);
    }
}

Conclusions

What we see here is that d_store can match all these relational queries. A few points.

Aggregations is manual work, involving a new database. But a new database in d_store is very much the same as a result table in SQL or Tutorial D, or as a normal table for that matter.

And you can see that the focus of d_store is totally different. All, this column AS name and others are not relevant at the level that d_store operates. Actually, in d_store it is better to leave any rewriting alone, like the total_pay is better left simulated then actually stored.

D_store is almost like a very naive SQL or Tutorial D low level engine. The naivety is in the fact that d_store actually needs to produce all intermediate and end result sets. And creating a result set with not all the fields, or fields from different tables, would be very expensive in d_store (and is totally unnecessary, again, because of the level at which d_store works). But it is obvious how the SQL queries translate into d_store.

The focus of d_store is not a relational engine. It is more on doing something useful with that data, like displaying it to a user. Like how exercise 4 is pretty academic, sure d_store can do it, but probably the user wants to see with who employee X is working. There is surely no need to use any extra table.

Something real

How would exercise 4 translate to something real?

An application can show you information about an employee, for instance a list of his projects he works on. That list is the d_select(works_on, 1, employee_id). So that store is already available in your application, because it drives the table view.

Also showing with who the employee works, you iterate that projects store, translating it to a list of employees working on those projects. And from there create a store that that is a subset of the all employees set.

To see with who he works for what project, simply don't iterate, but use the selected project. You can of-course also select an employee and then show all the projects they work on together.

In code:

d_store * projects = d_select(works_on, 1, employee_id);

// get a list of guys he works with for this project
d_store * works_with(d_value * project_id) {
  d_store * result = d_store_create_empty(emp);

  d_store * mates = d_select(works_on, 2, project_id);
  d_row * row;
  for (row = d_first(mates); row; row = d_next(mates)) {
    d_store_add_row(result, d_select_first(emp, d_row_get(row, 1)));
  }
}

// get all guys he works with
d_store * all_mates = d_store_create_empty(emp);
d_row * project;
for (project = d_first(projects); project; project = d_next(projects)) {
  d_store * project_mates = works_with(d_row_get(project, 2));
  d_store_add(all_mates, project_mates);
  d_store_free(project_mates);
}

And likely, you want your displays to be sorted in ways. Do this using

int projects_sorter(const d_row * left, const d_row * right, void * user_data) {
  // sorting by project id
  return D_TOINT(d_row_get(left, 2)) - D_TOINT(d_row_get(right, 2));
}

d_store_set_sorter(projects, projects_sorter, NULL);

...
d_store_set_sorter(all_mates, alpha_emp_sorter, NULL);

In UI: (imagine a two paned UI)

employee X works on works with
project1 emp1
project2 emp 2
emp 3
emp4

one particular project is highlighted, so only employees working that project show up:

employee X works on works with
project1 emp2
project2 emp 4

one particular employee is highlighted, so which projects are they doing together?

employee X works on works with
project2 emp1
emp 2
emp 3
emp4

This is more d_store territory. Sure you can do this with an SQL interface, but every time the user clicks, you need to execute new queries, insert the results into arrays (or something) and show them in your UI. This compared to d_store, where result sets can be queried further (so they can stay around), and can be used directly to drive table views or other fields in the UI.

Last modified: 2007-11-19 20:17 GMT