diff --git a/adbc.h b/adbc.h index badfc7d65a..f7ea69af2d 100644 --- a/adbc.h +++ b/adbc.h @@ -1292,9 +1292,11 @@ AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection, /// or while consuming an ArrowArrayStream returned from such. /// Calling this function should make the other functions return /// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from -/// methods of ArrowArrayStream). +/// methods of ArrowArrayStream). (It is not guaranteed to, for +/// instance, the result set may be buffered in memory already.) /// -/// This must always be thread-safe (other operations are not). +/// This must always be thread-safe (other operations are not). It is +/// not necessarily signal-safe. /// /// \since ADBC API revision 1.1.0 /// \addtogroup adbc-1.1.0 @@ -1947,9 +1949,11 @@ AdbcStatusCode AdbcStatementBindStream(struct AdbcStatement* statement, /// or while consuming an ArrowArrayStream returned from such. /// Calling this function should make the other functions return /// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from -/// methods of ArrowArrayStream). +/// methods of ArrowArrayStream). (It is not guaranteed to, for +/// instance, the result set may be buffered in memory already.) /// -/// This must always be thread-safe (other operations are not). +/// This must always be thread-safe (other operations are not). It is +/// not necessarily signal-safe. /// /// \since ADBC API revision 1.1.0 /// \addtogroup adbc-1.1.0 diff --git a/docs/source/format/specification.rst b/docs/source/format/specification.rst index e7a44a4198..88515e41c6 100644 --- a/docs/source/format/specification.rst +++ b/docs/source/format/specification.rst @@ -57,6 +57,26 @@ implementations will support this. - Go: ``OptionKeyAutoCommit`` - Java: ``org.apache.arrow.adbc.core.AdbcConnection#setAutoCommit(boolean)`` +Metadata +-------- + +ADBC exposes a variety of metadata about the database, such as what catalogs, +schemas, and tables exist, the Arrow schema of tables, and so on. + +.. _specification-statistics: + +Statistics +---------- + +.. note:: Since API revision 1.1.0 + +ADBC exposes table/column statistics, such as the (unique) row count, min/max +values, and so on. The goal here is to make ADBC work better in federation +scenarios, where one query engine wants to read Arrow data from another +database. Having statistics available lets the "outer" query planner make +better choices about things like join order, or even decide to skip reading +data entirely. + Statements ========== @@ -84,6 +104,16 @@ frees the user from knowing the right SQL syntax for their database. - Go: ``OptionKeyIngestTargetTable`` - Java: ``org.apache.arrow.adbc.core.AdbcConnection#bulkIngest(String, org.apache.arrow.adbc.core.BulkIngestMode)`` +.. _specification-cancellation: + +Cancellation +------------ + +.. note:: Since API revision 1.1.0 + +Queries (and operations that implicitly represent queries, like fetching +:ref:`specification-statistics`) can be cancelled. + Partitioned Result Sets ----------------------- @@ -97,6 +127,16 @@ machines. - Go: ``Statement.ExecutePartitions`` - Java: ``org.apache.arrow.adbc.core.AdbcStatement#executePartitioned()`` +.. _specification-incremental-execution: + +In principle, a vendor could return the results of partitioned execution as +they are available, instead of all at once. Incremental execution allows +drivers to expose this. When enabled, each call to ``ExecutePartitions`` will +return available endpoints to read instead of blocking to retrieve all +endpoints. + +.. note:: Since API revision 1.1.0 + Lifecycle & Usage ----------------- @@ -135,3 +175,73 @@ Partitioned Execution .. mermaid:: AdbcStatementPartitioned.mmd :caption: This is similar to fetching data in Arrow Flight RPC (by design). See :doc:`"Downloading Data" `. + +Error Handling +============== + +The error handling strategy varies by language. + +In C, most methods take a :cpp:class:`AdbcError`. In Go, most methods return +an error that can be cast to an ``AdbcError``. In Java, most methods raise an +``AdbcException``. + +In all cases, an error contains: + +- A status code, +- An error message, +- An optional vendor code (a vendor-specific status code), +- An optional 5-character "SQLSTATE" code (a SQL-like vendor-specific code). + +.. _specification-rich-error-metadata: + +Rich Error Metadata +------------------- + +.. note:: Since API revision 1.1.0 + +Drivers can expose additional rich error metadata. This can be used to return +structured error information. For example, a driver could use something like +the `Googleapis ErrorDetails`_. + +In C, special option values can be read after receiving an error to get error +metadata. In Go and Java, ``AdbcError`` and ``AdbcException`` respectively +expose a list of additional metadata. + +.. _Googleapis ErrorDetails: https://github.com/googleapis/googleapis/blob/master/google/rpc/error_details.proto + +Changelog +========= + +Version 1.1.0 +------------- + +The info key ADBC_INFO_DRIVER_ADBC_VERSION can be used to retrieve the +driver's supported ADBC version. + +The canonical options "uri", "username", and "password" were added to make +configuration consistent between drivers. + +:ref:`specification-cancellation` and the ability to both get and set options +of different types were added. (Previously, you could set string options but +could not get option values or get/set values of other types.) This can be +used to get and set the current active catalog and/or schema through a pair of +new canonical options. + +:ref:`specification-bulk-ingestion` supports two additional modes: + +- "adbc.ingest.mode.replace" will drop existing data, then behave like + "create". +- "adbc.ingest.mode.create_append" will behave like "create", except if the + table already exists, it will not error. + +:ref:`specification-rich-error-metadata` has been added, allowing clients to +get additional error metadata. + +The ability to retrive table/column :ref:`statistics +` was added. The goal here is to make ADBC work +better in federation scenarios, where one query engine wants to read Arrow +data from another database. + +:ref:`Incremental execution ` allows +streaming partitions of a result set as they are available instead of blocking +and waiting for query execution to finish before reading results. diff --git a/docs/source/format/versioning.rst b/docs/source/format/versioning.rst index 3205b792e1..b255aeebbe 100644 --- a/docs/source/format/versioning.rst +++ b/docs/source/format/versioning.rst @@ -29,14 +29,19 @@ choices were made: Of course, we can never add/remove/change struct members, and we can never change the signatures of existing functions. -The main point of concern is compatibility of :cpp:class:`AdbcDriver`. +In ADBC 1.1.0, it was decided this would only apply to the "public" +API, and not the driver-internal API (:cpp:class:`AdbcDriver`). New +members were added to this struct in the 1.1.0 revision. +Compatibility is handled as follows: The driver entrypoint, :cpp:type:`AdbcDriverInitFunc`, is given a -version and a pointer to a table of function pointers to initialize. -The type of the table will depend on the version; when a new version -of ADBC is accepted, then a new table of function pointers will be -added. That way, the driver knows the type of the table. If/when we -add a new ADBC version, the following scenarios are possible: +version and a pointer to a table of function pointers to initialize +(the :cpp:class:`AdbcDriver`). The size of the table will depend on +the version; when a new version of ADBC is accepted, then a new table +of function pointers may be expanded. For each version, the driver +knows the expected size of the table, and must not read/write fields +beyond that size. If/when we add a new ADBC version, the following +scenarios are possible: - An updated client application uses an old driver library. The client will pass a `version` field greater than what the driver @@ -46,7 +51,8 @@ add a new ADBC version, the following scenarios are possible: - An old client application uses an updated driver library. The client will pass a ``version`` lower than what the driver recognizes, so the driver can either error, or if it can still - implement the old API contract, initialize the older table. + implement the old API contract, initialize the subset of the table + corresponding to the older version. This approach does not let us change the signatures of existing functions, but we can add new functions and remove existing ones. @@ -64,7 +70,7 @@ backwards-incompatible versions such as 2.0.0, but which still implement the API standard version 1.0.0. Similarly, this documentation describes the ADBC API standard version -1.0.0. If/when a compatible revision is made (e.g. new standard -options are defined), the next version would be 1.1.0. If -incompatible changes are made (e.g. new API functions), the next -version would be 2.0.0. +1.1.0. If/when a compatible revision is made (e.g. new standard +options or API functions are defined), the next version would be +1.2.0. If incompatible changes are made (e.g. changing the signature +or semantics of a function), the next version would be 2.0.0. diff --git a/go/adbc/drivermgr/adbc.h b/go/adbc/drivermgr/adbc.h index badfc7d65a..f7ea69af2d 100644 --- a/go/adbc/drivermgr/adbc.h +++ b/go/adbc/drivermgr/adbc.h @@ -1292,9 +1292,11 @@ AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection, /// or while consuming an ArrowArrayStream returned from such. /// Calling this function should make the other functions return /// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from -/// methods of ArrowArrayStream). +/// methods of ArrowArrayStream). (It is not guaranteed to, for +/// instance, the result set may be buffered in memory already.) /// -/// This must always be thread-safe (other operations are not). +/// This must always be thread-safe (other operations are not). It is +/// not necessarily signal-safe. /// /// \since ADBC API revision 1.1.0 /// \addtogroup adbc-1.1.0 @@ -1947,9 +1949,11 @@ AdbcStatusCode AdbcStatementBindStream(struct AdbcStatement* statement, /// or while consuming an ArrowArrayStream returned from such. /// Calling this function should make the other functions return /// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from -/// methods of ArrowArrayStream). +/// methods of ArrowArrayStream). (It is not guaranteed to, for +/// instance, the result set may be buffered in memory already.) /// -/// This must always be thread-safe (other operations are not). +/// This must always be thread-safe (other operations are not). It is +/// not necessarily signal-safe. /// /// \since ADBC API revision 1.1.0 /// \addtogroup adbc-1.1.0