Table of Contents
A valid input of an HTSQL processor is called an HTSQL query.
In a regular mode of operation, an HTSQL processor is started as a web service and accepts queries as HTTP GET requests. However, an HTSQL query can also be executed using a command-line utility htsql-ctl or via internal Python API.
An HTSQL query is a string of characters in UTF-8 encoding. Octets composing the string could be written literally or percent-encoded. A percent-encoded octet is serialized as three characters: % followed by two hexdecimal digits encoding the octet value.
‘HTSQL’ | ‘HTSQL’ | ‘HTSQL’ |
---|---|---|
HTSQL | HTSQL | HTSQL |
Percent-encoding is useful for transmitting an HTSQL query via channels that forbid certain characters in literal form. The list of characters that should be encoded depends on the channel type, but the percent (%) character itself must always be percent-encoded.
A NUL character cannot appear in an HTSQL query, neither in literal nor in percent-encoded form.
The HTSQL processor decodes percent-encoded octets before parsing the query. As a consequence, a percent-encoded punctuation or operator character still plays its syntax role.
An HTSQL query is parsed into a sequence of tokens. The following tokens are recognized.
A sequence of alphanumeric characters that does not start with a digit.
A numeric literal: integer, decimal and exponential notations are recognized.
60 | 2.125 | 271828e-5 |
---|---|---|
60 | 2.125 | 2.71828 |
A string literal enclosed in single quotes; any single quote character should be doubled.
‘HTSQL’ | ‘O’‘Reilly’ |
---|---|
HTSQL | O’Reilly |
A valid symbol in the HTSQL grammar; that includes operators and punctuation characters. Some symbols are represented by more than one character (e.g. <=, !~).
Individual tokens may be separated by whitespace characters.
See htsql.tr.scan.Scanner for detailed description of HTSQL tokens.
A sequence of HTSQL tokens must obey the HTSQL grammar.
An HTSQL query starts is a segment expression optionally followed by a sequence of commands.
The following table lists HTSQL operations in the order of precedence, lowest to highest.
Operation | Description | Example Input | Output |
---|---|---|---|
/ T | segment | /school | |
x :fn | infix function call | 'HTSQL':length | 5 |
x :fn y | 1/3 :round 2 | 0.33 | |
x :fn (y,z,...) | 'HTSQL':slice(1,-1) | 'TSQ' | |
x +, x - | sorting direction | program{degree+} | |
T ? p | sieve | program?degree='ms' | |
T ^ x | projection | program^degree | |
T {x,y,...} | selection | school{code,name} | |
p | q | logical OR | true|false | true |
p & q | logical AND | true&false | false |
! p | logical NOT | !true | false |
x = y, x != y, | comparison | 2+2=4 | true |
x == y, x !== y | 'HTSQL'==null | false | |
x ~ y, x !~ y | 'HTSQL'~'SQL' | true | |
x < y, x <= y, | 12<7 | false | |
x > y, x >= y | 12>=7 | true | |
x + y, x - y | addition, subtraction | 'HT'+'SQL' | 'HTSQL' |
x * y, x / y | multiplication, division | 12*7 | 84 |
- x | negation | -42 | |
x -> T | attachment | 'south' -> school{campus} | |
T := x | assignment | num_prog := count(program) | |
S . T | composition | school.program | |
T [id] | locator | course[comp.304] | |
@ T | detachment | @school | |
{x,y,...} | record | {'bs','ms'} | |
(...) | grouping | (7+4)*2 | 22 |
* | wildcard selection | school.* | |
* number | school.*1 | ||
^ | projection complement | count(^) | |
$ name | reference | $code | |
fn (...) | function call | round(1/3,2) | 0.33 |
name | school | ||
number | 60, 2.125, 271828e-5 | ||
string | 'HTSQL' |
A command starts with / followed by a function application in infix notation.
code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…
Next we describe individual syntax elements.
An atomic expression is a basic syntax unit. HTSQL recognizes the following atoms.
An identifier is a sequence of characters which contains Latin letters, underscores (_), decimal digits and those Unicode characters that are classified as alphanumeric. An identifier must not start with a digit.
In HTSQL, identifiers are case-insensitive.
Identifiers are used to refer to database entities such as tables and attributes, to define calculated attributes, and to call functions.
name,count(department)
School of Art & Design,1
School of Business,3
…
In this example, four identifiers school, name, count and department represent respectively a table, a table attribute, a built-in function and a table link.
HTSQL supports two types of literal values: quoted and unquoted.
An unquoted (or numeric) literal is a number written in integer, decimal or exponential notation.
60 | 2.125 | 271828e-5 |
---|---|---|
60 | 2.125 | 2.71828 |
The range of allowed numeric values depends on the database backend. The type of a numeric literal is determined from notation: literals written in integer, decimal and exponential notation are assigned to integer, decimal and float data type respectively.
A quoted literal is a (possibly empty) sequence of arbitrary characters enclosed in single quotes. Any single quote in the value must be doubled.
‘HTSQL’ | ‘O’‘Reilly’ |
---|---|
HTSQL | O’Reilly |
The data type of a quoted literal is inferred from the context in which the literal is used; the default data type is string.
A wildcard selection (*) selects all output columns of the table.
code | name | campus | code | name | school_code |
---|---|---|---|---|---|
bus | School of Business | south | acc | Accounting | bus |
la | School of Arts and Humanities | old | arthis | Art History | la |
ns | School of Natural Sciences | old | astro | Astronomy | ns |
When followed by an integer literal N, a wildcard selects N-th output column of the table. N starts from 1 and should not exceed the number of output columns.
name | count(department) |
---|---|
School of Engineering | 4 |
School of Arts and Humanities | 6 |
School of Music & Dance | 4 |
A projection complement (^) represents a complement link from a projection to the projected flow.
Do not confuse a projection complement with a binary projection operator, which is also represented with the ^ character.
degree | count(^) |
---|---|
ba | 10 |
bs | 10 |
ct | 4 |
In this example, the first and the second occurrences of ^ indicate a projection operator and a projection complement respectively.
Any expression enclosed in parentheses ((...)) is treated syntactically as a single atom. Use grouping to override the default operator precedence.
Do not confuse a grouping operation with a function call, which also uses parentheses.
A comma-separated list of expressions enclosed in curly brackets ({...}) is called a record expression. Some functions and operators accept records as a way to specify multiple values.
code | name | campus |
---|---|---|
eng | School of Engineering | north |
ns | School of Natural Sciences | old |
A reference is an identifier preceded by a dollar sign ($). A reference is used to access a value defined in a different naming scope.
department_code | no | title | credits | description |
---|---|---|---|---|
acc | 315 | Financial Accounting | 5 | Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting. |
acc | 511 | Audit | 5 | This course considers legal, procedural and ethical aspects of audit in accounting. |
acc | 620 | Accounting Internship | 6 | Supervised internship in a business or nonprofit organization on an accounting position. |
In this example, a reference $avg_credits is defined in the root scope, but accessed in the scope of course.
For a more detailed description of references, see the section on naming scopes.
HTSQL has a large library of built-in functions and can be extended with user-defined functions.
A function call is represented as a function name followed by (, a comma-separated list of arguments, and ).
0.33 |
A function may accept no arguments, but the parentheses are still required.
For functions with at least one argument, HTSQL supports an alternative infix call notation. In this notation, the expression starts with the first argument followed by : and a function name, and then the rest of the arguments. The trailing arguments must be enclosed in parentheses if their number is greater than one.
today() :year | 1/3 :round 2 | ‘HTSQL’ :slice(1,-1) |
---|---|---|
2013 | 0.33 | TSQ |
This example could be equivalently expressed as
year(today()) | round(1/3,2) | slice(‘HTSQL’,1,-1) |
---|---|---|
2013 | 0.33 | TSQ |
Infix function calls are composable and have the lowest precedence among the operators.
‘h’+’t’+’t’+’p’ :replace(‘tp’,’sql’) :upper |
---|
HTSQL |
A function which argument is a segment expression is called a command.
code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…
To use infix call notation with a command, prepend the : indicator with /.
code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…
Consider the difference between regular infix function call and a command application.
code,name,campus
art,School of Art & Design,old
This example could be equivalently expressed as
code,name,campus
art,School of Art & Design,old
For a list and description of built-in functions, see Functions and Operators.
An HTSQL operator is denoted by a special character or a sequence of characters (e.g. +, <=). HTSQL has infix, prefix and postfix operators, and some operators admit all three forms.
The current version of HTSQL does not support user-defined operators; future versions may add this ability.
In HTSQL, the order in which operators are applied is determined by operator precedence. For example, multiplication and division operators have a higher precedence than addition and subtraction.
Some HTSQL operators are composable (e.g. arithmetic operators) and some are not (e.g. equality operators). We call the former associative and the latter non-associative.
Below we describe the syntax of HTSQL operators. For a more comprehensive description, see Functions and Operators.
HTSQL supports the following logical operators:
In this list, the operators are sorted by the order of precedence, from lowest to highest. All logical operators are left-associative.
true|false | true&false | !false |
---|---|---|
true | false | true |
HTSQL supports the following comparison operators:
2+2=4 | ‘HTSQL’~’SQL’ | 12>7&7>=2 |
---|---|---|
true | true | true |
All comparison operators have the same precedence and are not associative.
Future versions of HTSQL may make ordering operators left-associative to express between operation (e.g. a <= x <= b).
HTSQL supports the usual set of arithmetic operators:
Arithmetic operators have standard precedence and associativity.
HTSQL supports specialized operators to work with flow expressions:
The sieve operator (T ? p) produces rows of T satisfying condition p.
code | name | campus |
---|---|---|
bus | School of Business | south |
mus | School of Music & Dance | south |
The projection operator (T ^ x) produces a flow of unique values of x as it ranges over T. Do not confuse the projection operator with a projection complement.
degree |
---|
ba |
bs |
ct |
The selection operator (T {x,y,...}) specifies output columns.
code | name |
---|---|
art | School of Art & Design |
bus | School of Business |
edu | College of Education |
Sieve, projection and selection operators have the same precedence and are left-associative.
campus | avg(school.count(department)) |
---|---|
north | 4.0 |
old | 5.0 |
south | 3.5 |
HTSQL supports a flow composition operator:
T . x
The composition operator evaluates x for each row of the flow T. The values of x form the resulting flow.
code | name | school_code |
---|---|---|
acc | Accounting | bus |
econ | Economics | bus |
mm | Management & Marketing | bus |
The composition operator is left-associative.
The locator operation selects an individual entity by its identity:
T [id]
The identity is a sequence of attributes and links which uniquely identify an entity. Values of separate attributes are separated by a period (.). An individual value could be left unquoted when it consists of digits, alphabetical characters or dash (-). Values could be grouped using pairs of brackets ([]) or parentheses (()).
code | name | school_code |
---|---|---|
comp | Computer Science | eng |
department_code | no | title | credits | description |
---|---|---|---|---|
comp | 515 | Software Design | 3 | Study of good software development techniques: UML, object-oriented design, design patterns, GUI design principles, testing, debugging and profiling. |
department_code | course_no | year | season | section | instructor_code | class_seq |
---|---|---|---|---|---|---|
comp | 515 | 2010 | spring | 001 | rmejia | 1405 |
HTSQL has two operators for generating ad-hoc links:
The detachment operator (@ T) generates an ad-hoc link to T by associating each row from the input flow with all rows from T.
department_code | no | title | credits | description |
---|---|---|---|---|
acc | 315 | Financial Accounting | 5 | Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting. |
acc | 511 | Audit | 5 | This course considers legal, procedural and ethical aspects of audit in accounting. |
acc | 620 | Accounting Internship | 6 | Supervised internship in a business or nonprofit organization on an accounting position. |
The precedence of the detachment operator is higher than of any other operator. Therefore, to apply the operator to any expression other than an identifier or a function call, use parentheses.
The attachment operator (x -> T) generates ad-hoc link to T by associating each row from the input flow with all rows from T such that values of expression x evaluated against respective rows coincide.
name | dob |
---|---|
Virginia Hester | 1986-03-20 |
Marvin Hall | 1986-03-20 |
Douglas Robles | 1986-03-20 |
A segment expression is a prefix operator:
/ T
A segment expression converts a flow to a list value.
name | department |
---|---|
name | |
School of Art & Design | Studio Art |
School of Business | Accounting |
Economics | |
Management & Marketing | |
The following postfix decorators indicate ascending and descending sorting order respectively:
x +, x -
department_code | no | title | credits | description |
---|---|---|---|---|
acc | 620 | Accounting Internship | 6 | Supervised internship in a business or nonprofit organization on an accounting position. |
acc | 315 | Financial Accounting | 5 | Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting. |
acc | 511 | Audit | 5 | This course considers legal, procedural and ethical aspects of audit in accounting. |
Sorting decorators have the same precedence as infix function call.
Sorting decorators are only meaningful when used as arguments of the sort() function and in a selector expression.
An assignment expression has the form:
T := x
The left side of an assignment expression indicates the name and formal parameters (if any) of a calculated attribute. It must be an identifier, a reference or a function call and can be preceded by an optional dot-separated sequence of identifiers.
The right side of an assignment is an arbitrary expression indicating the value of a calculated attribute.
name | num_dept |
---|---|
School of Engineering | 4 |
School of Arts and Humanities | 6 |
School of Music & Dance | 4 |
An assignment expression could be used only as an argument of functions define() and where(), or in a selector expression.