Skip to content

Commit

Permalink
docs: update prose for 1.1.0 and clarify cancellation (#932)
Browse files Browse the repository at this point in the history
Fixes #928.
Fixes #929.
  • Loading branch information
lidavidm authored Jul 24, 2023
1 parent a06e53a commit 1d3fdfb
Show file tree
Hide file tree
Showing 4 changed files with 143 additions and 19 deletions.
12 changes: 8 additions & 4 deletions adbc.h
Original file line number Diff line number Diff line change
Expand Up @@ -1292,9 +1292,11 @@ AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
/// or while consuming an ArrowArrayStream returned from such.
/// Calling this function should make the other functions return
/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
/// methods of ArrowArrayStream).
/// methods of ArrowArrayStream). (It is not guaranteed to, for
/// instance, the result set may be buffered in memory already.)
///
/// This must always be thread-safe (other operations are not).
/// This must always be thread-safe (other operations are not). It is
/// not necessarily signal-safe.
///
/// \since ADBC API revision 1.1.0
/// \addtogroup adbc-1.1.0
Expand Down Expand Up @@ -1947,9 +1949,11 @@ AdbcStatusCode AdbcStatementBindStream(struct AdbcStatement* statement,
/// or while consuming an ArrowArrayStream returned from such.
/// Calling this function should make the other functions return
/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
/// methods of ArrowArrayStream).
/// methods of ArrowArrayStream). (It is not guaranteed to, for
/// instance, the result set may be buffered in memory already.)
///
/// This must always be thread-safe (other operations are not).
/// This must always be thread-safe (other operations are not). It is
/// not necessarily signal-safe.
///
/// \since ADBC API revision 1.1.0
/// \addtogroup adbc-1.1.0
Expand Down
110 changes: 110 additions & 0 deletions docs/source/format/specification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,26 @@ implementations will support this.
- Go: ``OptionKeyAutoCommit``
- Java: ``org.apache.arrow.adbc.core.AdbcConnection#setAutoCommit(boolean)``

Metadata
--------

ADBC exposes a variety of metadata about the database, such as what catalogs,
schemas, and tables exist, the Arrow schema of tables, and so on.

.. _specification-statistics:

Statistics
----------

.. note:: Since API revision 1.1.0

ADBC exposes table/column statistics, such as the (unique) row count, min/max
values, and so on. The goal here is to make ADBC work better in federation
scenarios, where one query engine wants to read Arrow data from another
database. Having statistics available lets the "outer" query planner make
better choices about things like join order, or even decide to skip reading
data entirely.

Statements
==========

Expand Down Expand Up @@ -84,6 +104,16 @@ frees the user from knowing the right SQL syntax for their database.
- Go: ``OptionKeyIngestTargetTable``
- Java: ``org.apache.arrow.adbc.core.AdbcConnection#bulkIngest(String, org.apache.arrow.adbc.core.BulkIngestMode)``

.. _specification-cancellation:

Cancellation
------------

.. note:: Since API revision 1.1.0

Queries (and operations that implicitly represent queries, like fetching
:ref:`specification-statistics`) can be cancelled.

Partitioned Result Sets
-----------------------

Expand All @@ -97,6 +127,16 @@ machines.
- Go: ``Statement.ExecutePartitions``
- Java: ``org.apache.arrow.adbc.core.AdbcStatement#executePartitioned()``

.. _specification-incremental-execution:

In principle, a vendor could return the results of partitioned execution as
they are available, instead of all at once. Incremental execution allows
drivers to expose this. When enabled, each call to ``ExecutePartitions`` will
return available endpoints to read instead of blocking to retrieve all
endpoints.

.. note:: Since API revision 1.1.0

Lifecycle & Usage
-----------------

Expand Down Expand Up @@ -135,3 +175,73 @@ Partitioned Execution
.. mermaid:: AdbcStatementPartitioned.mmd
:caption: This is similar to fetching data in Arrow Flight RPC (by
design). See :doc:`"Downloading Data" <arrow:format/Flight>`.

Error Handling
==============

The error handling strategy varies by language.

In C, most methods take a :cpp:class:`AdbcError`. In Go, most methods return
an error that can be cast to an ``AdbcError``. In Java, most methods raise an
``AdbcException``.

In all cases, an error contains:

- A status code,
- An error message,
- An optional vendor code (a vendor-specific status code),
- An optional 5-character "SQLSTATE" code (a SQL-like vendor-specific code).

.. _specification-rich-error-metadata:

Rich Error Metadata
-------------------

.. note:: Since API revision 1.1.0

Drivers can expose additional rich error metadata. This can be used to return
structured error information. For example, a driver could use something like
the `Googleapis ErrorDetails`_.

In C, special option values can be read after receiving an error to get error
metadata. In Go and Java, ``AdbcError`` and ``AdbcException`` respectively
expose a list of additional metadata.

.. _Googleapis ErrorDetails: https://github.com/googleapis/googleapis/blob/master/google/rpc/error_details.proto

Changelog
=========

Version 1.1.0
-------------

The info key ADBC_INFO_DRIVER_ADBC_VERSION can be used to retrieve the
driver's supported ADBC version.

The canonical options "uri", "username", and "password" were added to make
configuration consistent between drivers.

:ref:`specification-cancellation` and the ability to both get and set options
of different types were added. (Previously, you could set string options but
could not get option values or get/set values of other types.) This can be
used to get and set the current active catalog and/or schema through a pair of
new canonical options.

:ref:`specification-bulk-ingestion` supports two additional modes:

- "adbc.ingest.mode.replace" will drop existing data, then behave like
"create".
- "adbc.ingest.mode.create_append" will behave like "create", except if the
table already exists, it will not error.

:ref:`specification-rich-error-metadata` has been added, allowing clients to
get additional error metadata.

The ability to retrive table/column :ref:`statistics
<specification-statistics>` was added. The goal here is to make ADBC work
better in federation scenarios, where one query engine wants to read Arrow
data from another database.

:ref:`Incremental execution <specification-incremental-execution>` allows
streaming partitions of a result set as they are available instead of blocking
and waiting for query execution to finish before reading results.
28 changes: 17 additions & 11 deletions docs/source/format/versioning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,19 @@ choices were made:
Of course, we can never add/remove/change struct members, and we can
never change the signatures of existing functions.

The main point of concern is compatibility of :cpp:class:`AdbcDriver`.
In ADBC 1.1.0, it was decided this would only apply to the "public"
API, and not the driver-internal API (:cpp:class:`AdbcDriver`). New
members were added to this struct in the 1.1.0 revision.
Compatibility is handled as follows:

The driver entrypoint, :cpp:type:`AdbcDriverInitFunc`, is given a
version and a pointer to a table of function pointers to initialize.
The type of the table will depend on the version; when a new version
of ADBC is accepted, then a new table of function pointers will be
added. That way, the driver knows the type of the table. If/when we
add a new ADBC version, the following scenarios are possible:
version and a pointer to a table of function pointers to initialize
(the :cpp:class:`AdbcDriver`). The size of the table will depend on
the version; when a new version of ADBC is accepted, then a new table
of function pointers may be expanded. For each version, the driver
knows the expected size of the table, and must not read/write fields
beyond that size. If/when we add a new ADBC version, the following
scenarios are possible:

- An updated client application uses an old driver library. The
client will pass a `version` field greater than what the driver
Expand All @@ -46,7 +51,8 @@ add a new ADBC version, the following scenarios are possible:
- An old client application uses an updated driver library. The
client will pass a ``version`` lower than what the driver
recognizes, so the driver can either error, or if it can still
implement the old API contract, initialize the older table.
implement the old API contract, initialize the subset of the table
corresponding to the older version.

This approach does not let us change the signatures of existing
functions, but we can add new functions and remove existing ones.
Expand All @@ -64,7 +70,7 @@ backwards-incompatible versions such as 2.0.0, but which still
implement the API standard version 1.0.0.

Similarly, this documentation describes the ADBC API standard version
1.0.0. If/when a compatible revision is made (e.g. new standard
options are defined), the next version would be 1.1.0. If
incompatible changes are made (e.g. new API functions), the next
version would be 2.0.0.
1.1.0. If/when a compatible revision is made (e.g. new standard
options or API functions are defined), the next version would be
1.2.0. If incompatible changes are made (e.g. changing the signature
or semantics of a function), the next version would be 2.0.0.
12 changes: 8 additions & 4 deletions go/adbc/drivermgr/adbc.h
Original file line number Diff line number Diff line change
Expand Up @@ -1292,9 +1292,11 @@ AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
/// or while consuming an ArrowArrayStream returned from such.
/// Calling this function should make the other functions return
/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
/// methods of ArrowArrayStream).
/// methods of ArrowArrayStream). (It is not guaranteed to, for
/// instance, the result set may be buffered in memory already.)
///
/// This must always be thread-safe (other operations are not).
/// This must always be thread-safe (other operations are not). It is
/// not necessarily signal-safe.
///
/// \since ADBC API revision 1.1.0
/// \addtogroup adbc-1.1.0
Expand Down Expand Up @@ -1947,9 +1949,11 @@ AdbcStatusCode AdbcStatementBindStream(struct AdbcStatement* statement,
/// or while consuming an ArrowArrayStream returned from such.
/// Calling this function should make the other functions return
/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
/// methods of ArrowArrayStream).
/// methods of ArrowArrayStream). (It is not guaranteed to, for
/// instance, the result set may be buffered in memory already.)
///
/// This must always be thread-safe (other operations are not).
/// This must always be thread-safe (other operations are not). It is
/// not necessarily signal-safe.
///
/// \since ADBC API revision 1.1.0
/// \addtogroup adbc-1.1.0
Expand Down

0 comments on commit 1d3fdfb

Please sign in to comment.