SQL is a strongly typed language. That is, every data item has an associated data type which determines its behavior and allowed usage. LightDB has an extensible type system that is more general and flexible than other SQL implementations. Hence, most type conversion behavior in LightDB is governed by general rules rather than by ad hoc heuristics. This allows the use of mixed-type expressions even with user-defined types.
The LightDB scanner/parser divides lexical elements into five fundamental categories: integers, non-integer numbers, strings, identifiers, and key words. Constants of most non-numeric types are first classified as strings. The SQL language definition allows specifying type names with strings, and this mechanism can be used in LightDB to start the parser down the correct path. For example, the query:
SELECT text 'Origin' AS "label", point '(0,0)' AS "value"; label | value --------+------- Origin | (0,0) (1 row)
has two literal constants, of type text
and point
.
If a type is not specified for a string literal, then the placeholder type
unknown
is assigned initially, to be resolved in later
stages as described below.
There are four fundamental SQL constructs requiring distinct type conversion rules in the LightDB parser:
Much of the LightDB type system is built around a rich set of functions. Functions can have one or more arguments. Since LightDB permits function overloading, the function name alone does not uniquely identify the function to be called; the parser must select the right function based on the data types of the supplied arguments.
LightDB allows expressions with prefix and postfix unary (one-argument) operators, as well as binary (two-argument) operators. Like functions, operators can be overloaded, so the same problem of selecting the right operator exists.
SQL INSERT
and UPDATE
statements place the results of
expressions into a table. The expressions in the statement must be matched up
with, and perhaps converted to, the types of the target columns.
UNION
, CASE
, and related constructs
Since all query results from a unionized SELECT
statement
must appear in a single set of columns, the types of the results of each
SELECT
clause must be matched up and converted to a uniform set.
Similarly, the result expressions of a CASE
construct must be
converted to a common type so that the CASE
expression as a whole
has a known output type. Some other constructs, such
as ARRAY[]
and the GREATEST
and LEAST
functions, likewise require determination of a
common type for several subexpressions.
The system catalogs store information about which conversions, or casts, exist between which data types, and how to perform those conversions. Additional casts can be added by the user with the CREATE CAST command. (This is usually done in conjunction with defining new data types. The set of casts between built-in types has been carefully crafted and is best not altered.)
An additional heuristic provided by the parser allows improved determination
of the proper casting behavior among groups of types that have implicit casts.
Data types are divided into several basic type
categories, including boolean
, numeric
,
string
, bitstring
, datetime
,
timespan
, geometric
, network
, and
user-defined. (For a list see Table 48.74;
but note it is also possible to create custom type categories.) Within each
category there can be one or more preferred types, which
are preferred when there is a choice of possible types. With careful selection
of preferred types and available implicit casts, it is possible to ensure that
ambiguous expressions (those with multiple candidate parsing solutions) can be
resolved in a useful way.
All type conversion rules are designed with several principles in mind:
Implicit conversions should never have surprising or unpredictable outcomes.
There should be no extra overhead in the parser or executor if a query does not need implicit type conversion. That is, if a query is well-formed and the types already match, then the query should execute without spending extra time in the parser and without introducing unnecessary implicit conversion calls in the query.
Additionally, if a query usually requires an implicit conversion for a function, and if then the user defines a new function with the correct argument types, the parser should use this new function and no longer do implicit conversion to use the old function.