Overview of HTSQL

HTSQL was created in 2005 to provide an XPath-like HTTP interface to PostgreSQL for client-side XSLT screens and reports. HTSQL found its audience when analysts and researchers bypassed the user interface and started to use URLs directly. The language has evolved since then.

What is HTSQL?

HTSQL is a comprehensive navigational query language for relational databases and web service gateway.

HTSQL is a Web Service

http://demo.htsql.org/school
output of /school query

On the left is a URL, on the right is what a browser would show.

HTSQL is a query language for the web. Queries are URLs that can be directly typed into a browser; the output could be returned in a variety of formats including HTML, CSV, JSON, etc. HTSQL can be used as the basis for dashboarding tools and other browser-based applications. In this way, database queries can be shared, tweaked, and used in any number of ways.

HTSQL is a Relational Database Gateway

SELECT "school"."code",
       "school"."name",
       "school"."campus"
FROM "ad"."school"
ORDER BY 1 ASC

On the left is an HTSQL query, on the right is SQL it is translated to.

HTSQL wraps your existing existing relational database, transparently handling SQL complexities for you. The current version of HTSQL supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server. We’ve taken care to abstract differences between these SQL dialects so that a given HTSQL query has consistent semantics across database server implementations.

HTSQL is an Advanced Query Language

SELECT "school"."name", COALESCE("program"."count", 0), COALESCE("department"."count", 0)
FROM "ad"."school"
LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "program"."school_code" FROM "ad"."program" GROUP BY 2) AS "program" ON ("school"."code" = "program"."school_code")
LEFT OUTER JOIN (SELECT COUNT(TRUE) AS "count", "department"."school_code" FROM "ad"."department" GROUP BY 2) AS "department" ON ("school"."code" = "department"."school_code")
ORDER BY "school"."code" ASC

On the left is an HTSQL query, on the right is SQL it is translated to.

Besides typical expression algebra and function set, HTSQL provides sophisicated navigational query mechanism, composable query fragments and an extensive macro inclusion system. In particular, nested aggregations and projections are easy to understand and use.

HTSQL is a Communication Tool

Show me schools, and, for each school,
- its name, its location,
- number of programs and departments,
- and the average number of courses
  across each of its departments?

On the left is a business inquiry, on the right is the HTSQL translation.

HTSQL is first and formost designed for the accidental programmer and as such provides a direct mapping of common business inquiries onto a computer parsable and executable syntax. Just because a query must be processable by a machine, doesn’t mean it shouldn’t be human readable.

HTSQL is a Python Library

from htsql import HTSQL
demo = HTSQL("pgsql:///htsql_demo")
rows = demo.produce("/school")
for row in rows:
    print row
school(code=u'art',
       name=u'School of Art & Design',
       campus=u'old')
school(code=u'bus',
       name=u'School of Business',
       campus=u'south')
...

HTSQL can be embedded into any Python application to provide an intuitive object based query engine for complex reporting. It works out of the box with Jinja and other tools. We provide meta-data adapters for Django and SQLAlchemy.

Our Philosophy

We develop HTSQL to liberate the Accidental Programmer—professionals and data experts who are not software engineers by trade, but who must write database queries or data processing code to get things done. HTSQL handles routine data processing needs in an accessible, transparent, rigorous and embeddable manner.

Accessible

We want HTSQL to be broadly usable. Our query language should provide business analysts, information scientists, and data curators self-service access and control over their database. Not only should these accidental programmers be able able to answer complex business inquiries themselves, they should be able to share the queries they create with their colleagues. HTSQL must be a productive tool for problem solving with a small learning curve.

Transparent

We think aesthetics matter. When a data analyst is focusing on a domain specific problem, the HTSQL query language should do the heavy lifting but otherwise stay in the background. Translation of a business inquiry into the HTSQL query language must be natural and obvious. Initial query authoring is just the beginning. A database query is often the only human readable expression of a business rule, so each and every query must be a pleasure to review, share and maintain.

Rigorous

We know correctness is critical. The HTSQL query language is based upon a navigational data linking and flow processing model having consistent semantics that are independent of the underlying database architecture. The language designed to be composable so that query fragments can be independently tested and combined. HTSQL’s syntax is regular enough that syntax highlighting and context sensitive name lookup is possible.

Embeddable

We realize a query language is not a product. Instead, HTSQL is a tool used as part of a workflow solution or embedded into an application. Our public application program interface is be simple and stable. Further, since accidental programmers might require features which other software developers could build, we have an extensive plug-in interface for those who wish to add features to HTSQL itself. Between web service wrappers or plug-ins, it should be possible to customize almost every aspect of HTSQL without requiring a code fork.

HTSQL in a Nutshell

HTSQL was designed from the ground up as a self-serve reporting tool for data analysts. With HTSQL, the easy stuff is truly easy; and, the complex stuff is easy too.

In this section we introduce the fundamentals of HTSQL syntax and semantics. For a more incremental approach, please read the HTSQL Tutorial. For the purposes of this section, we use a fictitious university schema.

This data model has two top-level tables, school and department, where department has an optional link to school. Subordinate tables, course and program, have mandatory links to their parents.

Scalar Expressions

Literal values:

3.14159 ‘Hello World!’
3.14159 Hello World!

Algebraic expressions:

Predicate expressions:

Filtering

Sieve operator produces records satisfying the specified condition:

code name campus
bus School of Business south
mus School of Music & Dance south

Sorting operator reorders records:

code name campus
ph Public Honorariums
sc School of Continuing Studies
eng School of Engineering north
art School of Art & Design old

Truncating operator takes a slice from the record sequence:

code name campus
art School of Art & Design old
bus School of Business south

Selection & Definition

Selection specifies output columns:

name campus
School of Art & Design old
School of Business south
College of Education old
School of Engineering north

Title decorator defines the title of an output column:

name # of Dept
School of Art & Design 1
School of Business 3
College of Education 2
School of Engineering 4

Output records could nest:

name school
name campus
Accounting School of Business south
Art History School of Arts and Humanities old
Astronomy School of Natural Sciences old
Bioengineering School of Engineering north

Calculated attributes factor out repeating expressions:

References carry over values across nested scopes:

title credits
Financial Accounting 5
Audit 5
Accounting Internship 6
History of Art Criticism I 4

Locators

Locator operator picks a single record by ID:

code name school_code
comp Computer Science eng

A composite ID consists of labels separated by a period:

department_code no title credits description
comp 515 Software Design 3 Study of good software development techniques: UML, object-oriented design, design patterns, GUI design principles, testing, debugging and profiling.

Function id() returns the record ID:

comp.102
comp.130
comp.150

Nested Segments

The segment (/) operator embeds a result of a correlated query as a nested list. For instance, a list of schools could include associated departments:

code name campus department
code name school_code
art School of Art & Design old stdart Studio Art art
bus School of Business south acc Accounting bus
econ Economics bus
mm Management & Marketing bus
edu College of Education old edpol Educational Policy edu
tched Teacher Education edu

Nesting can be arbitrarily deep:

name department
name course
title
College of Education Educational Policy Introduction to Education
Contemporary Society
Sociology of Childhood
Technology in the Classroom
Technology, Society and Schools
Economics and Education Policy
Politics and Education Policy
Education Policy Analysis
Children’s Literature
Education Policy and Practice
Social Analysis of Education Policy
Classroom Visit
Organizational Analysis of Education Policy
Seminar in Education Policy I
Seminar in Education Policy II
Qualitative Research in Education Policy
Teacher Education Teaching Methodology
Theory and Practice of Early Childhood Education
Methods of Early Science Education
Play as Education Method
Developmental Psychology
Selection of Learning Resources
Teacher Identity
Problems in Education Management
Challenges of Teaching the Gifted and Talented
Techniques of Mathematics Teaching
Techniques of Science Teaching
Techniques of Language Teaching
Problems in Education
Public School Internship
Preschool Internship
Special Topics in Teacher Education
Practice of Mathematics Teaching
Practice of Science Teaching
Practice of Language Teaching

A query may have adjacent nested segments:

name department program
name title
School of Art & Design Studio Art Post Baccalaureate in Art History
Bachelor of Arts in Art History
Bachelor of Arts in Studio Art
School of Business Accounting Master of Arts in Economics
Economics Graduate Certificate in Accounting
Management & Marketing Certificate in Business Administration
B.S. in Accounting
Bachelor of Business Administration
Bachelor of Arts in Economics

Aggregation

Aggregates convert plural expressions to singular values.

Scalar aggregates:

Nested aggregates:

Various aggregation operations:

name count(course) max(course.credits) sum(course.credits) avg(course.credits)
Accounting 12 6 42 3.5
Art History 20 6 70 3.5
Astronomy 22 5 66 3.0
Bioengineering 17 8 55 3.23529411765

Projection

Projection (^) returns distinct values. This example returns distinct campus values from the school table:

campus
north
old
south

In the scope of the projection, school refers to all records from school table having the same value of campus attribute:

campus count(school) school
code name campus
north 1 eng School of Engineering north
old 4 art School of Art & Design old
edu College of Education old
la School of Arts and Humanities old
ns School of Natural Sciences old

Projections combine with other language features in a natural way. The next example displays distinct program degrees offered by each school:

name count(program^degree) program^degree
degree
School of Art & Design 2 ba
pb
School of Business 4 ba
bs
ct
ma

Linking

Links between tables are automatic and relative, inferred from foreign key constraints. Unlink (@) permits arbitrary, non-relative linking.

name count(department)
School of Business 3
School of Engineering 4
School of Arts and Humanities 6
School of Music & Dance 4

The query above returns schools with the number of departments above average among all schools.

Output

HTSQL can output the result in a variery of formats. JSON:

{
  "school": [
    {
      "code": "art",
      "name": "School of Art & Design"
    },
    {
      "code": "bus",
      "name": "School of Business"
    },
…

XML:

<?xml version="1.0" encoding="UTF-8" ?>
<htsql:result xmlns:htsql="http://htsql.org/2010/xml">
  <school>
    <code>art</code>
    <name>School of Art &amp; Design</name>
  </school>
  <school>
    <code>bus</code>
    <name>School of Business</name>
  </school>
…

CSV:

code,name
art,School of Art & Design
bus,School of Business
…

Why not SQL?

Relational algebra is frequently inadequate for encoding business inquiries — elementary set operations do not correspond to meaningful data transformations. The SQL language itself is tedious, verbose, and provides poor means of abstraction. Yet, the relational database is an excellent tool for data modeling, storage and retrieval.

HTSQL reimagines what it means to query a database. The combination of a navigational model with data flows enables expressions that naturally reflect business inquiries. The HTSQL translator uses SQL as a target assembly language, which allows us to fix the query model and language while keeping current investment in relational systems.

SQL Conflates Rows & Columns

“For each department, please show the department name and the corresponding school’s campus.”

This business inquiry clearly separates the requested rows (each department) and columns (department name and corresponding school’s campus), but this separation is lost when the query is encoded in SQL:

SELECT d.name, s.campus
FROM ad.department AS d
LEFT JOIN ad.school AS s
       ON (d.school_code = s.code);

In this SQL query, the FROM clause not only picks target rows, but also includes extra tables required to produce output columns. This conflation makes it difficult to determine business entities represented by each row of the output.

The HTSQL translation separates the row definition from the column selection. The linking is implicit, and correct. The encoded query can be read aloud as a verbal inquiry.

Conflating Projection with Aggregation

“How many departments by campus?”

This business inquiry asks for rows corresponding to each campus, and for each row, the number of correlated departments. In the schema, there isn’t a campus table, so we have to take distinct values of campus column from the school table. This operation is called projection.

SELECT s.campus, COUNT(d)
FROM ad.school AS s
LEFT JOIN ad.department AS d
  ON (s.code = d.school_code)
WHERE s.campus IS NOT NULL
GROUP BY s.campus;

For this SQL encoding, the GROUP BY clause combines two operations: projection and evaluating the aggregate COUNT(). This conflation causes a reader of the query some effort determining what sort of rows are returned and how the aggregate is related to those rows.

In the HTSQL query, we start with an explicit projection (the ^ operator), then we select correlated columns. This way, the aggregation is indicated separately as part of the column selector rather than being conflated with the row definition.

SQL Lacks Means of Encapsulation

“For each department, return the department name and the number of offered 100’s, 200’s, 300’s and 400’s courses.”

In this business inquiry, we are asked to evaluate the same statistic across multiple ranges.

SELECT d.name,
       COUNT(CASE WHEN c.no BETWEEN 100 AND 199 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 200 AND 299 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 300 AND 399 THEN TRUE END),
       COUNT(CASE WHEN c.no BETWEEN 400 AND 499 THEN TRUE END)
FROM ad.department AS d
LEFT JOIN ad.course AS c
       ON (c.department_code = d.code)
GROUP BY d.name;

This query is tedious to write and error prone to maintain since SQL provides no way to factor the repetitive expression COUNT(...).

The HTSQL translation avoids this duplication by defining a calculated attribute count_courses($level) on the department table and then evaluating it for each course level.

In SQL, Modest Complexity is Painful

“For each school with a degree program, return the school’s name, and the average number of high-credit (>3) courses its departments have.”

This business inquiry asks us to do the following:

  • pick records from the school table
  • keep only those with an associated degree program
  • for each school record, compute average of:
    • for each associated department, count:
      • associated courses with credits>3
SELECT s.name, o.avg_over_3
FROM ad.school AS s
JOIN ad.program AS p ON (p.school_code = s.code)
LEFT JOIN (
    SELECT d.school_code, AVG(COALESCE(i.over_3,0)) AS avg_over_3
    FROM ad.department d
    LEFT JOIN (
        SELECT c.department_code, COUNT(c) AS over_3
        FROM ad.course AS c WHERE c.credits > 3
        GROUP BY c.department_code
    ) AS i ON (i.department_code = d.code)
    GROUP BY d.school_code
) AS o ON (o.school_code = s.code)
GROUP BY s.name, o.avg_over_3;

Not only is this SQL encoding is hard to read, it took several passes to get right — without the COALESCE you get results that look correct, but aren’t.

Each syntactic component of the HTSQL query is self-contained; when assembled, they form a cohesive translation of the business inquiry.