Tree form requests
Overview
Currently HTSQL allows multi-segment requests that form a chain. For example, a query
/organization{id(),name}/+person{id(),full_name}
produces a list of all organizations, and, for each organization, it also produces the corresponding list of all employees. Serialized into XML, the command output will be:
<htsql:result>
<organization id="acorn" name="Acorn Architecture">
<person id="acorn.hideo" full_name="WATANABE Hideo"/>
</organization>
<organization id="attic" name="Attic Bowling">
</organization>
<organization id="lakeside" name="Lake Side Partners, LLC">
<person id="lakeside.amy" full_name="Amy S. Buckworth"/>
<person id="lakeside.dave" full_name="David Jones"/>
</organization>
<organization id="smith" name="Rudgen, Taupe, & Smith">
<person id="smith.jack" full_name="Jack Taupe"/>
<person id="smith.jose" full_name="José N. Marteñes"/>
<person id="smith.maggy" full_name="Margret N. Smith"/>
</organization>
The corresponding JSON output:
[
[ "acorn", "Acorn Architecture", [
[ "acorn.hideo", "WATANABE Hideo" ] ] ],
[ "attic", "Attic Bowling", [] ],
[ "lakeside", "Lake Side Partners, LLC", [
[ "lakeside.amy", "Amy S. Buckworth" ],
[ "lakeside.dave", "David Jones" ] ] ],
[ "smith", "Rudgen, Taupe, & Smith", [
[ "smith.jack", "Jack Taupe" ],
[ "smith.jose", "José N. Marteñes" ],
[ "smith.maggy", "Margret N. Smith" ] ] ]
]
The internal API produces an iterator, which generates the following sequence:
(0, ["acorn", "Acorn Architecture"]) (1, ["acorn.hideo", "WATANABE Hideo"]) (0, ["attic", "Attic Bowling"]) (0, ["lakeside", "Lake Side Partners, LLC"]) (1, ["lakeside.amy", "Amy S. Buckworth"]) (1, ["lakeside.dave", "David Jones"]) (0, ["smith", "Rudgen, Taupe, & Smith"]) (1, ["smith.jack", "Jack Taupe"]) (1, ["smith.jose", "José N. Marteñes"]) (1, ["smith.maggy", "Margret N. Smith"])
Each item in the sequence is a pair (index, row). The index value specifies the segment number in the original request (left-to-right, starting from 0). The row value is a list of column values for the index-th segment. Note that the items in the sequence are in the same order as the opening tags in the XML output.
The same way, the query /organization/+project lists all organization together with associated projects.
It is possible to use chained requests of length more than two. For example, /organization/+project/+workitem adds work items to each project.
There are three variants of the join condition between two subsequent segments: / (INNER JOIN), /(+) (LEFT OUTER JOIN) and /(*) (CROSS JOIN).
We are going to extend chained queries in order to permit queries that form a tree or a forest of an arbitrary size.
The problem: How could one get the list of all organizations together with the corresponding person and project lists? In other words, how could a user merge two requests: /organization{id(),name}/+person{id(),full_name} and /organization{id(),name}/+project{id(),name} producing XML output of the form:
<htsql:result>
<organization id="acorn" name="Acorn Architecture">
<person id="acorn.hideo" full_name="WATANABE Hideo"/>
</organization>
<organization id="attic" name="Attic Bowling">
<project id="Bowl-Shoes" name="Bowling Shoes"/>
</organization>
<organization id="lakeside" name="Lake Side Partners, LLC">
<person id="lakeside.amy" full_name="Amy S. Buckworth"/>
<person id="lakeside.dave" full_name="David Jones"/>
</organization>
<organization id="smith" name="Rudgen, Taupe, & Smith">
<person id="smith.jack" full_name="Jack Taupe"/>
<person id="smith.jose" full_name="José N. Marteñes"/>
<person id="smith.maggy" full_name="Margret N. Smith"/>
<project id="smak" name="Smith Associate Window"/>
<project id="smbl" name="Smith Balcony Expansion"/>
<project id="smen" name="Smith Entry and Waiting Room"/>
</organization>
</htsql:result>
Syntax
How do you represent a tree (or a forest) in the form of a URL?
Consider the tree:
organization
| |
person project---------
| | |
private_info participation workitem
|
timeslip
In the linear URL representation, we list the nodes in the DFS order and separate them with special symbols which denote the structure of the tree:
organization person private_info project participation timeslip workitem
The request starts with the / symbol. We also notice that the tables person and private_info, as well as participation and timeslip, form a chain, therefore they should be separated with /. The symbol ; is used for separating sibling nodes:
/organization person/private_info ; project participation/timeslip ; workitem
Now we have two options: either we enclose each subtree with () or we enclose all sibling nodes in a single group with ():
/organization (person/private_info) ; (project (participation/timeslip) ; workitem)
or
/organization (person/private_info ; project (participation/timeslip ; workitem))
Adding the remaining / symbols, we finally obtain two variants of the syntax:
/organization/(person/private_info);(project/(participation/timeslip);workitem)
or
/organization/(person/private_info;project/(participation/timeslip;workitem))
In general, these two syntax structures could be represented as
/root/(subtree_1);(subtree_2);...;(subtree_N)
and
/root/(subtree_1;subtree_2;...;subtree_N)
The question: which syntax is better?
After a lot of considerations, we have decided in favor of variant 2. However a support for a straightforward "forest" syntax was requested. A query of the form
/tree_1;/tree_2;...;/tree_N
should produce a forest of N trees.
The new syntax can be expressed using the productions:
forest ::= '/' tree (';' forest)?
tree ::= segment ('/' branches)?
branches ::= tree | '(' tree (';' tree)* ')'
Here is the complete top-level grammar. Low-level productions identifier, locator, selector, test, cmd_args are not changed.
input ::= '/' command? format? filter?
| '/' ( path | role ( '/' path )? )
( ( '/' | '/' command format? | '/'? format ) filter? )?
role ::= '~' identifier
path ::= tree ( ';' '/' tree )*
tree ::= segment ( '/' branches )?
branches ::= tree | '(' tree ( ';' tree )* ')'
segment ::= ( identifier locator? selector? | selector ) filter?
filter ::= '?' test?
command ::= identifier '(' cmd_args? ')'
format ::= '.' identifier ( '(' cmd_args? ')' )?
Semantics
SQL Queries
The way multi-segment requests are processed will undergo drastic changes. Currently a multi-segment request is transformed to a single SQL query of the form:
SELECT segment_1.*, segment_2.*, ..., segment_N.*
FROM (SELECT ...) AS segment_1
JOIN (SELECT ...) AS segment_2 ON (segment_1.tie = segment_2.segment_1_tie)
...
JOIN (SELECT ...) AS segment_N ON (segment_{N-1}.tie = segment_N.segment_{N-1}_tie)
HTSQL processor receives from the database the result set:
[segment_1_row_1] [segment_2_row_1] ... [segment_N_row_1] [segment_1_row_1] [segment_2_row_1] ... [segment_N_row_2] [segment_1_row_1] [segment_2_row_1] ... [segment_N_row_3] ... [segment_1_row_1] [segment_2_row_2] ... [segment_N_row_1] ... [segment_1_row_2] [segment_2_row_1] ... [segment_N_row_1] ...
The processor identifies identical segment rows and squeezes the result set into the generator:
(1, row_1)
(2, row_1)
...
(N, row_1)
(N, row_2)
(N, row_3)
...
(2, row_2)
...
(N, row_1)
...
(1, row_2)
...
The advantage of this approach is that the database engine performs merge sort automatically. The disadvantage, however, is that the HTSQL processor has to skip duplicate rows.
This approach won't work for tree request. The HTSQL processor has to generate a separate SELECT request for each segment in the query. For example, consider the request:
/organization{name+}?is_active/(person{full_name+};project{name,status+}?status!='abandoned')
HTSQL processor will generate three SQL statements:
SELECT o.name, o.org_id AS "tie"
FROM op.organization o
WHERE o.is_active
ORDER BY o.name, o.org_id;
SELECT p.full_name, o.org_id AS "tie"
FROM op.person p
INNER JOIN op.organization o ON (p.org_id = o.org_id)
WHERE o.is_active
ORDER BY o.name, o.org_id, p.full_name, p.org_id, p.nickname;
SELECT p.name, p.status, o.org_id AS "tie"
FROM op.project p
INNER JOIN op.organization o ON (p.client = o.org_id)
WHERE o.is_active AND p.status != 'abandoned'
ORDER BY o.name, o.org_id, p.status, p.prj_id;
The requests will produce the following three result sets:
name | tie
-----------------------+-------------
Acorn Architecture | acorn
Lake Carmen Towers | lake-carmen
Lake Shore Apartments | lake-apts
Meyers Construction | meyers
Rwyler's Shoes | shoe
(5 rows)
full_name | tie
---------------------+-----------
WATANABE Hideo | acorn
Tommy O'Mally | lake-apts
Jack C. Meyers Esq. | meyers
Jake Meyers | meyers
Jay Adams | meyers
Jim Meyers | meyers
Mark Marteñs | meyers
Mark Thomas Hill | meyers
Gregory Shoemaker | shoe
Meg Shoemaker | shoe
Melanie Shoemaker | shoe
(11 rows)
name | status | tie
--------------------------------------+-------------+-------------
Toaster Re-Do | in-progress | lake-carmen
Updating Fire Escape | planned | lake-carmen
Kitchen Remodel at 102 N. Ocean View | completed | lake-apts
Siding and Roof at 334 N. Ocen View | completed | lake-apts
(4 rows)
These 20 rows will be merged into a single sequence using the auxiliary "tie" column:
(0, ['Acorn Architecture']) (1, ['WATANABE Hideo']) (0, ['Lake Carmen Towers']) (2, ['Toaster Re-Do', 'in-progress']) (2, ['Updating Fire Escape', 'planned']) (0, ['Lake Shore Apartments']) (1, ['Tommy O''Mally']) (2, ['Kitchen Remodel at 102 N. Ocean View', 'completed']) (2, ['Siding and Roof at 334 N. Ocean View', 'completed']) (0, ['Meyers Construction']) (1, ['Jack C. Meyers Esq.']) (1, ['Jake Meyers']) (1, ['Jay Adams']) (1, ['Jim Meyers']) (1, ['Mark Marteñs']) (1, ['Mark Thomas Hill']) (0, ['Rwyler''s Shoes']) (1, ['Gregory Shoemaker']) (1, ['Meg Shoemaker']) (1, ['Melanie Shoemaker'])
Projections
Currently multi-segment requests allow projection applied only to leaf segments. Technically, there's no reason not to allow projections in any segment in the request tree. Thus, both requests:
/organization{id()}/project{status|array(id())}
and
/project{status|array(id())}/project
could be allowed. However note that in the former case, an implicit column organization.id() is included to the projection base. That is, the request is equivalent to
/organization{id()}/project{organization.id(),status|array(id())}
This is required because a parent segment must be singular with regard to a child segment (organization and project{status|} in this case).
Empty Segments
An empty or a zero-element segment is a segment with an empty selector. Examples:
/{}
/organization[meyers]{}
Currently if all segments in the request are empty, HTSQL produces 204 No Content output.
Perhaps it makes sense to skip empty segments from output. Suppose that
/organization[lake-apts]{id()}/(person{id()};project{id()})
produces the generator
(0, ['lake-apts']) (1, ['lake-apts.tom']) (2, ['la-102']) (2, ['la-334'])
and XML output:
<htsql:result>
<organization id="lake-apts">
<person id="lake-apts.tom"/>
<project id="la-102"/>
<project id="la-334"/>
</organization>
</htsql:result>
Then the request
/organization[lake-apts]{}/(person{id()};project{id()})
could produce a forest generator
(0, ['lake-apts.tom']) (1, ['la-102']) (1, ['la-334'])
and XML output:
<htsql:result> <person id="lake-apts.tom"/> <project id="la-102"/> <project id="la-334"/> </htsql:result>
It's not clear whether such queries could be useful.
However empty segments make sense for the commands insert(), update() and delete() where one might want to perform a command, but is not interested in the data. A query of the form /table{}/select(expect:N) could be used to ensure that the number of rows in the table is equal to N.
Parameters expect, offset, and limit
The parameter expect:N ensures that the command affects exactly N rows. The parameter offset:N skips the first N rows of the result set. The parameter limit:N specifies the maximum number of rows to output.
These parameters make perfect sense for single segment queries. However the interpretation of the parameters for a query consisting of more than one segment is not quite clear. Currently these parameters apply to the main SQL query, which doesn't make much sense. For instance, in the query /organization/person/select(expect:N), the number N must coincide with the number of persons, while in the query /organization/+person/select(expect:N) N coincides with the number of persons incremented by the number of organizations without any corresponding persons. The parameters limit and offset produce even stranger results. Suppose, for example, that the query /organization{id(),name}/+person{id(),full_name} generates the result set
(0, ["acorn", "Acorn Architecture"]) (1, ["acorn.hideo", "WATANABE Hideo"]) (0, ["attic", "Attic Bowling"]) (0, ["lakeside", "Lake Side Partners, LLC"]) (1, ["lakeside.amy", "Amy S. Buckworth"]) (1, ["lakeside.dave", "David Jones"]) (0, ["smith", "Rudgen, Taupe, & Smith"]) (1, ["smith.jack", "Jack Taupe"]) (1, ["smith.jose", "José N. Marteñes"]) (1, ["smith.maggy", "Margret N. Smith"])
Then the query /organization{id(),name}/+person{id(),full_name}/select(offset:3,limit:2) generates
(0, ["lakeside", "Lake Side Partners, LLC"]) (1, ["lakeside.dave", "David Jones"]) (0, ["smith", "Rudgen, Taupe, & Smith"]) (1, ["smith.jack", "Jack Taupe"])
which is hardly useful.
For a tree request with the single root segment, we propose to apply the parameters expect, offset, and limit to the root segment of the request. For instance the query /organization/person/select(expect:4) will ensure that the result set contains exactly 4 organizations. The query /organization{id(),name}/person{id(),full_name}/select(offset:2,limit:2) will generate
(0, ["lakeside", "Lake Side Partners, LLC"]) (1, ["lakeside.amy", "Amy S. Buckworth"]) (1, ["lakeside.dave", "David Jones"]) (0, ["smith", "Rudgen, Taupe, & Smith"]) (1, ["smith.jack", "Jack Taupe"]) (1, ["smith.jose", "José N. Marteñes"]) (1, ["smith.maggy", "Margret N. Smith"])
How to treat the expect, offset, and limit parameters in a "forest" request? One possibility is to assume that roots of the trees in the forest are attached to a single implicit scalar root segment. Thus expect, offset, and limit would assume that the result set contains a single row corresponding to the root segment. For example, /person;/project/select(expect:N) is valid only if N=1, /person;/project/select(offset:1) produces an empty result set and so on.
API
Generic API
The commands which produce a tree-form result set include select(), insert(), update(), delete(), merge() and even meta(). These commands share the same API.
Suppose command is an instance of the class commands.Select or one of its subclasses. In order to execute the command and obtain the result set, write
result = command.execute()
The object result is a generator object producing the rows of the result set. Additionally result contains attributes which describe the shape of the request tree and the names and types of the segment elements.
As it was described earlier, result is an iterator which produces pairs: (index, row), where index is the segment number starting from 0 and row is a list containing element values. For example, if the HTSQL query is
/organization[lakeside,lake-apts,lake-carmen]{id(),name}/(person{id(),full_name,email};project{id(),name,status})
then the code
for index, row in result: print index, row
will produce the output
0 ['lake-apts', 'Lake Shore Apartments'] 1 ['lake-apts.tom', 'Tommy O\'Mally', None] 2 ['la-102', 'Kitchen Remodel at 102 N. Ocean View', 'completed'] 2 ['la-334', 'Siding and Roof at 334 N. Ocen View', 'completed'] 0 ['lake-carmen', 'Lake Carmen Towers'] 2 ['lt-711', 'Updating Fire Escape', 'planned'] 2 ['lt-802', 'Toaster Re-Do', 'in-progress'] 0 ['lakeside', 'Lake Side Partners, LLC'] 1 ['lakeside.amy', 'Amy S. Buckworth', None] 1 ['lakeside.dave', 'David Jones', 'david.joines@example.com']
The result variable also contains additional attributes:
result.parent is a dictionary from int to int or None. If index is a segment number, result.parent[index] is a number of the parent segment or None if the segment is top-level.
For the previous query, result.parent is a dictionary {0: None, 1: 0, 2: 0}.
result.parent allows one to form the structure of the output. For example, the following code could be used to generate XML output:
stack = [] current = None for index, row in result: while result.parent[index] != current: write_close_tag(current) current = stack.pop() parents.append(current) current = index write_open_tag(current, row) while current is not None: write_close_tag(current) current = stack.pop()
The other attributes of result describe the segments and elements of the query. result.title is the title of the request. result.segments is a list of items describing each segment in the request.
If segment is an element of result.segments, then segment.title is the title of the segment and segment.elements is a list of the elements in the segment.
If element is an element of segment.elements, then element.title is the title of the element and element.domain is the domain of the element.
Convenient API
The generic API described above is suitable for performing request in which the structure of the output is not known to the programmer. However if the query is not arbitrary, it could make sense to use a more convenient API.
For instance, if the program performs the request
/organization{id(),name}/(person{id(),full_name,email};project{id(),name,status})
it could process the result set with
result = command.execute(SmartCursor) for organization in result: print "organization %s: name=%s" % (organization.id, organization.name) for person in organization.person: print "\tperson %s: full_name=%s, email=%s" % (person.id, person.full_name, person.email) for project in organization.project: print "\tproject %s: name=%s, status=%s" % (project.id, project.name, project.status)
![(please configure the [header_logo] section in trac.ini)](/chrome/site/your_project_logo.png)