HTSQL : A Conceptual Framework for Link Navigation

HTSQL is a high-level navigational query language for relational databases. It is written by Kirill Simonov and Clark Evans in the Python language. HTSQL works with SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server.

Presenter:Clark C. Evans, Prometheus Research, LLC
Date:May 20, 2011
Location:PgCon 2011, Ottawa, Canada

What is HTSQL?

img/what_is_it.png

HTSQL is NoSQL for PostgreSQL ;)

Relational Algebra is a Poor Fit

img/poorfit.jpg

SQL violates strict relational model to overcome real limitations in the theory.

What is Navigational Model?

Let's use entity relationships as navigation, and build query language around this principle.

Theory follows practice.

Example University Schema

img/schema.png

SQL conflates Rows & Columns

"Please list departments; for each department,
show the corresponding school's campus."
SELECT d.code, d.name, s.campus
FROM ad.department AS d
LEFT JOIN ad.school AS s
       ON (d.school_code = s.code);

The business inquiry asks for a specific set of rows, and then correlated columns. The SQL encoding returns a subset of a cross product making it difficult to ensure what each row represents. The FROM clause doesn't just pick rows, it also plays and auxiliary role in choosing columns.

Keep Row Definition Separate

"Please list departments; for each department,
the show the corresponding school's campus."
/department{code, name, school.campus}

The navigational translation separates the row definition from the column selection. The linking is implicit, and correct. The encoded query can be read aloud as a verbal inquiry.

SQL Optimization Makes things Worse

"For each department, return the department's
name and number of courses having more than 3 credit hours."
SELECT d.name, count(c)
FROM ad.department AS d
LEFT JOIN ad.course AS c
       ON (c.department_code = d.code
           AND c.credits > 3)
GROUP BY d.name;

To optimize, the subquery is replaced by a GROUP BY projection. This gives us both row/column and link/filter conflation, obfuscating the business inquiry.

Conflating Projection with Aggregation

"How many departments by campus?"
SELECT s.campus, COUNT(d)
FROM ad.school AS s
LEFT JOIN ad.department AS d
  ON (s.code = d.school_code)
WHERE s.campus IS NOT NULL
GROUP by s.campus;

In the schema there isn't a campus table, you have to take distinct values from the school table. In this SQL query its not clear if the GROUP BY is used only to produce an aggregate, you have to examine primary key columns to know for sure.

Keep Projection Separate

"How many departments by campus?"
/(school^campus) {campus, count(school.department)}

In a navigational approach, you first construct the projection explicitly (using ^ operator). Then, you select from it. In this way the aggregation is indicated separately as part of the column selector rather than being confused with the row definition.

For SQL, Modest Complexity is Painful

"For each school with a degree program, return
the school's name, and the average number of high-credit (>3) courses its departments have."
SELECT s.name, o.avg_over_3 FROM ad.school AS s
JOIN ad.program AS p ON (p.school_code = s.code)
LEFT JOIN (
    SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
    FROM ad.department d
    LEFT JOIN (
        SELECT c.department_code, COUNT(c) AS over_3
         FROM ad.course AS c WHERE c.credits > 3
         GROUP BY c.department_code
    ) AS i ON (i.department_code = d.code)
    GROUP BY d.school_code
) AS o ON (o.school_code = s.code)
GROUP BY s.name, o.avg_over_3;

Case Study: RexDB Constraint Report

img/rexdb.jpg img/rexdb_navigation.jpg

Case Study: RexDB Constraint Report

img/rexdb_constraint.jpg

Case Study: RexDB Constraint Report

Since rows & columns are clearly isolated, queries can be composed dynamically. The general template is:

/individual{id, check_1, check_2}?filter

A specific check a researcher may configure looks like:

/individual{id, adi_sanity, srs_13_pair}
   ?sex='m'&exists(participation.study='aie')
:where(srs_13_pair := !exists(srs?q3>24|q1>7),
       adi_sanity := ... )

This returns male individuals participating in the 'aie' study, executing two completely orthogonal checks.

Towards a Navigational Model

Scalar Expressions

Literal Values
/{3.14159, 'Hello World'}
Algebraic Expressions
/(3+4)*6
Predicate Expressions
/true()|(false()&true())

Relative to a table, column attributes are scalar values.

Filtering

Filtering doesn't change what each row means, but it does change what rows are produced.

Sieve
/school?campus='north'
Sort
/school.order(campus)
Truncate
/school.limit(2)

Selection

Selection is the mechanism for changing what column attributes are included in each value of the output.

Output Selection
/school{name, campus}

Calculated Attributes:

/school.define(num_dept:=count(department))
{code, num_dept}?num_dept>3

Up-Stream References:

/define($avg:=avg(course.credits))
.course?credits>$avg

Aggregation

Aggregation converts a plural expression into a singular one; it incorporates sub-flows without changing the row definition.

Counts & Existence
/school{name, count(department)}?exists(program)

Nested Aggregation:

/school{name, avg(department.count(course)),
              avg(department.course.credits)}
Nested Lists (this winter?)
/school{name, /department}

Projection

Projection re-defines the meaning of a flow to include only distinct values from a particular selection.

Distinct Values
/school^campus
Complement Space
/(school^campus){campus, count(school)}
Distinct Selection
/program^{school.campus, degree}

Advanced Navigation

While the basic navigation follows foreign key links, arbitrary links are permitted. The fork() syntax sugar links to the same table.

Linking Operator
/school.({code}->department{school_code})
Fork Syntax
/course?credits>avg(fork().credits)

Decoration

Marking parts of the stream with post-processing hints so that the navigation can be completed with pretty formatting.

Output Labels
/school{name, count(department) :as '# of Dept.'}

This area of the language needs more definition, but is needed for declarative inclusion of formatting directives.

HTSQL Implementation

img/service.png

HTSQL Compiler

img/compiler.png

Development Status

HTSQL is quite usable currently, but it may have gaps for a given application. Particular items we'll be addressing in coming months are:

How do I get it?

Open Community

Dual-License & Support

Q&A

Please visit our community site, http://htsql.org, our commercial site http://htsql.com, we are best found at #htsql on freenode. The source code is freely available at http://bitbucket.org/prometheus/htsql

Generous support for HTSQL was provided by Prometheus Research, LLC and The Simons Foundation. This material is also based upon work supported by the National Science Foundation under Grant #0944460.