diff --git a/.travis.yml b/.travis.yml index b573578..97931ce 100644 --- a/.travis.yml +++ b/.travis.yml @@ -9,6 +9,7 @@ php: - 7.1 - 7.2 - 7.3 + - 7.4snapshot env: matrix: diff --git a/README.md b/README.md index 79ea48b..defd472 100644 --- a/README.md +++ b/README.md @@ -6,13 +6,12 @@ Porter -![Data flow diagram][Data flow diagram] +![Data flow diagram][] @@ -135,22 +136,24 @@ Our application calls `Porter::import()` with an `ImportSpecification` and recei Import specifications --------------------- -Import specifications specify *what* to import, *how* it should be [transformed](#transformers) and whether to use [caching](#caching). The only required parameter, passed to the constructor, is a `ProviderResource` that specifies the resource we want to import. +Import specifications specify *what* to import, *how* it should be [transformed](#transformers) and whether to use [caching](#caching). In synchronous code, create an new instance of `ImportSpecification` and pass a `ProviderResource` that specifies the resource we want to import. In Asynchronous code, create `AsyncImportSpecification` instead. -Options may be configured by some of the methods listed below. +Options may be configured using the methods below. - `setProviderName(string)` – Sets the provider service name. - - `addTransformer(Transformer)` – Adds a transformer to the end of the transformation queue. + - `addTransformer(Transformer)` – Adds a transformer to the end of the transformation queue. In async code, pass `AsyncTransformer` instead. - `addTransformers(Transformer[])` – Adds one or more transformers to the end of the transformation queue. - `setContext(mixed)` – Specifies user-defined data to be passed to transformers. - `enableCache()` – Enables caching. Requires a `CachingConnector`. - `setMaxFetchAttempts(int)` – Sets the maximum number of fetch attempts per connection before failure is considered permanent. - `setFetchExceptionHandler(FetchExceptionHandler)` – Sets the exception handler invoked each time a fetch attempt fails. +In synchronous code, import specifications are an instance of `ImportSpecification` + Record collections ------------------ -Record collections are `Iterator`s, guaranteeing imported data is enumerable using `foreach`. Each *record* of the collection is the familiar and flexible `array` type, allowing us to represent any flat or structured data hierarchy, like CSV or JSON, as an array. +Record collections are `Iterator`s, guaranteeing imported data is enumerable using `foreach`. Each *record* of the collection is the familiar and flexible `array` type, allowing us to present structured or flat data, such as JSON, XML or CSV, as an array. ### Details @@ -164,12 +167,12 @@ The stack of record collection types informs us of the transformations a collect Since record collections are just objects, it is possible to define derived types that implement custom fields to expose additional *metadata* in addition to the iterated data. Collections are very good at representing a repeating series of data but some APIs send additional non-repeating data which we can expose as metadata. However, if the data is not repeating at all, it should be treated as a single record rather than metadata. -The result of a successful `Porter::import` call is always an instance of `PorterRecords` or `CountablePorterRecords`, depending on whether the number of records is known. If we need to access methods of the original collection, returned by the provider, we can call `findFirstCollection()` on the collection. For an example, see [CurrencyRecords][CurrencyRecords] of the [European Central Bank Provider][ECB] and its associated [test case][ECB test]. +The result of a successful `Porter::import` call is always an instance of `PorterRecords` or `CountablePorterRecords`, depending on whether the number of records is known. If we need to access methods of the original collection, returned by the provider, we can call `findFirstCollection()` on the collection. For an example, see [CurrencyRecords][] of the [European Central Bank Provider][ECB] and its associated [test case][ECB test]. Asynchronous ------------ -The new asynchronous API, introduced in version 5, is built on top of the fully programmable asynchronous framework, [Amp]. The synchronous API is not compatible with the asynchronous API so one must decide which to use. In general, the asynchronous API should be preferred for new projects because async can do everything sync can do, including emulating synchronous behaviour, but sync code cannot behave asynchronously without significant refactoring. +The asynchronous API, introduced in version 5, is built on top of the fully programmable asynchronous framework, [Amp][]. The synchronous API is not compatible with the asynchronous API so one must decide which to use. In general, the asynchronous API should be preferred for new projects because async can do everything sync can do, including emulating synchronous behaviour, but sync code cannot behave asynchronously without significant refactoring. We must be inside the async event loop to begin programming asynchronously. Let's illustrate how to rewrite the [earlier example](#importing-data) asynchronously. @@ -184,24 +187,26 @@ We must be inside the async event loop to begin programming asynchronously. Let' }); ``` -Programming asynchronously requires an understanding of Amp, the async framework. Further details can be found in the official Amp documentation. +Programming asynchronously requires an understanding of Amp, the async framework. Further details can be found in the official [Amp documentation][]. Transformers ------------ Transformers manipulate imported data. Transforming data is useful because third-party data seldom arrives in a format that looks exactly as we want. Transformers are added to the transformation queue of an `ImportSpecification` by calling its `addTransformer` method and are executed in the order they are added. -Porter includes one transformer, `FilterTransformer`, that removes records from the collection based on a predicate. For more information, see [filtering](#filtering). More powerful data transformations can be designed with [MappingTransformer][MappingTransformer]. More transformers may be available from [Porter transformers][Porter transformers]. +Porter includes one transformer, `FilterTransformer`, that removes records from the collection based on a predicate. For more information, see [filtering](#filtering). More powerful data transformations can be designed with [MappingTransformer][]. More transformers may be available from [Porter transformers][]. ### Writing a transformer -Transformers implement the `Transformer` interface that defines one method with the following signature. +Transformers implement the `Transformer` and/or `AsyncTransformer` interfaces that define one or more of the following methods. ```php -public function transform(RecordCollection $records, $context): RecordCollection; +public function transform(RecordCollection $records, mixed $context): RecordCollection; + +public function transformAsync(AsyncRecordCollection $records, mixed $context): AsyncRecordCollection; ``` -When `transform()` is called the transformer may iterate each record and change it in any way, including removing or inserting additional records. The record collection must be returned by the method, whether or not changes were made. +When `transform()` or `transformAsync()` is called the transformer may iterate each record and change it in any way, including removing or inserting additional records. The record collection must be returned by the method, whether or not changes were made. Transformers should also implement the `__clone` magic method if the they store any object state, in order to facilitate deep copy when Porter clones the owning `ImportSpecification` during import. @@ -245,7 +250,7 @@ Durability only applies when connectors throw a recoverable exception type deriv Caching ------- -Any connector can be wrapped in a `CachingConnector` to provide [PSR-6][PSR-6] caching facilities to the base connector. Porter ships with one cache implementation, `MemoryCache`, which caches fetched data in memory, but this can be substituted for any other PSR-6 cache implementation. The `CachingConnector` caches raw responses for each unique [cache key](#cache-keys). +Any connector can be wrapped in a `CachingConnector` to provide [PSR-6][] caching facilities to the base connector. Porter ships with one cache implementation, `MemoryCache`, which caches fetched data in memory, but this can be substituted for any other PSR-6 cache implementation. The `CachingConnector` caches raw responses for each unique [cache key](#cache-keys). Remember that whilst using a `CachingConnector` enables caching, caching must also be enabled on a per-import basis by calling `ImportSpecification::enableCache()`. @@ -264,12 +269,12 @@ $records = $porter->import(
-INTERMISSION ------------- +INTERMISSION ☕️ +-------------- Congratulations! We have covered everything needed to use Porter. -The rest of this readme is for those wishing to go deeper. Continue when you're ready to learn how to write [providers](#providers), [resources](#resources) and [connectors](#connectors). ☕️ +The rest of this readme is for those wishing to go deeper. Continue when you're ready to learn how to write [providers](#providers), [resources](#resources) and [connectors](#connectors).
@@ -280,7 +285,7 @@ Architecture The following UML class diagram shows a partial architectural overview illustrating Porter's main components and how they are related. [[enlarge][Class diagram]] -[![Class diagram][Class diagram]][Class diagram] +[![Class diagram][]][Class diagram] Providers --------- @@ -306,14 +311,14 @@ final class MyProvider implements Provider { private $connector; - public function __construct(HttpConnector $connector = null) + public function __construct(Connector $connector = null) { $this->connector = $connector ?: new HttpConnector; } - public function getConnector() - { - return $this->connector; + public function getConnector(): Connector + { + return $this->connector; } } ``` @@ -360,7 +365,7 @@ public function fetch(ImportConnector $connector) } ``` -Since the total number of records is known, the iterator can be wrapped in `CountableProviderRecords` to enrch the caller with this information. +Since the total number of records is known, the iterator can be wrapped in `CountableProviderRecords` to enrich the caller with this information. ```php public function fetch(ImportConnector $connector) @@ -369,7 +374,7 @@ public function fetch(ImportConnector $connector) foreach (range(1, $limit) as $number) { yield [$number]; } - } + }; return new CountableProviderRecords($series($count = 3), $count, $this); } @@ -380,16 +385,16 @@ public function fetch(ImportConnector $connector) In the following example we create a resource that receives a connector from `MyProvider` and uses it to retrieve data from a hard-coded URL. We expect the data to be JSON encoded so we decode it into an array and use `yield` to return it as a single-item iterator. ```php -class MyResource extends AbstractResource +class MyResource implements ProviderResource, SingleRecordResource { private const URL = 'https://example.com'; - public function getProviderClassName() + public function getProviderClassName(): string { return MyProvider::class; } - public function fetch(ImportConnector $connector) + public function fetch(ImportConnector $connector): \Iterator { $data = $connector->fetch(self::URL); @@ -398,10 +403,10 @@ class MyResource extends AbstractResource } ``` -If the data represents a repeating series, yield each record separately instead, as in the following example. +If the data represents a repeating series, yield each record separately instead, as in the following example and remove the `SingleRecordResource` marker interface. ```php -public function fetch(ImportConnector $connector) +public function fetch(ImportConnector $connector): \Iterator { $data = $connector->fetch(self::URL); @@ -411,24 +416,6 @@ public function fetch(ImportConnector $connector) } ``` -If we need to make any changes to the connector before calling fetch, such as attaching a POST body to an HTTP request, we can call `$connector->findBaseConnector()` to access the underlying connector and modify it as normal. Don't forget to check the underlying connector is of the expected type before trying to modify it. - -```php -public function fetch(ImportConnector $connector) -{ - $baseConnector = $connector->findBaseConnector(); - - if ($baseConnector instanceof HttpConnector) { - $baseConnector->getOptions() - ->setMethod('POST') - ->setContent(http_build_query(['foo' => 'bar'])) - ; - } - - // ... -} -``` - #### Exception handling Unrecoverable exceptions will be thrown and can be caught as normal, but good connector implementations will wrap their connection attempts in a retry block and throw a `RecoverableConnectorException`. The only way to intercept a recoverable exception is by attaching a `FetchExceptionHandler` to the `ImportConnector` by calling its `setExceptionHandler()` method. Exception handlers cannot be used for flow control because their return values are ignored, so the main application of such handlers is to re-throw recoverable exceptions as non-recoverable exceptions. @@ -436,61 +423,35 @@ Unrecoverable exceptions will be thrown and can be caught as normal, but good co Connectors ---------- -Connectors fetch remote data from a source specified at fetch time. Connectors for popular protocols are available from [Porter connectors][Porter connectors]. It might be necessary to write a new connector if dealing with uncommon or currently unsupported protocols. +Connectors fetch remote data from a source specified at fetch time. Connectors for popular protocols are available from [Porter connectors][Porter connectors]. It might be necessary to write a new connector if dealing with uncommon or currently unsupported protocols. Writing providers and resources is a common task that should be fairly easy but writing a connector is less common. ### Writing a connector -Writing providers and resources is a common task that should be fairly easy but writing a connector is slightly less common and has some specific technical considerations that must be carefully considered. A connector implements the `Connector` interface that defines one method with the following signature. +A connector implements the `Connector` interface that defines one method with the following signature. ```php -public function fetch(ConnectionContext $context, $source): mixed; +public function fetch(DataSource $source): mixed; ``` -When `fetch()` is called the connector fetches data from the specified source. Connectors may return data in any format that's convenient for resources to consume, but in general, such data should be as raw as possible and without modification. If multiple pieces of information are returned it is recommended to use a specialized response class, like the HTTP connector that returns the response body and headers together in an `HttpResponse`. - -#### Options +When `fetch()` is called the connector fetches data from the specified data source. Connectors may return data in any format that's convenient for resources to consume, but in general, such data should be as raw as possible and without modification. If multiple pieces of information are returned it is recommended to use a specialized object, like the `HttpResponse` returned by the HTTP connector that contains the response headers and body together. -If a connector has configurable options it must implement `ConnectorOptions` so that other parts of Porter, such as `CachingConnector`, are aware and work correctly. Any connector implementing `ConnectorOptions` must also implement a `__clone()` method to ensure all of its objects are cloned, including the `EncapsulatedOptions` instance. A minimal implementation follows. +#### Data sources -```php -class MyConnector implements Connector, ConnectorOptions -{ - private $options; +The `DataSource` interface must be implemented to supply the necessary parameters for a connector to locate a data source. For an HTTP connector, this might include URL, method, body and headers. For a database connector, this might be a SQL query. - public function getOptions() - { - return $this->options; - } +`DataSource` specifies one method with the following signature. - public function __clone() - { - $this->options = clone $this->options; - } - - // ... -} +```php +public function computeHash(): string; ``` -#### Durability - -To support Porter's durability features a connector may throw a subclass of `RecoverableConnectorException` to signal that the fetch operation can be retried. Execution will halt as normal if any other exception type is thrown. It is recommended to always throw a recoverable exception type unless it is certain that any number of subsequent attempts will always fail. +Data sources are required to return a unique hash for their state. If the state changes, the hash must change. If states are effectively equivalent, the hash must be the same. This is used by the cache system to determine whether the fetch operation has been seen before and thus can be served from the cache rather than fetching fresh data again. -Recoverable exceptions must be wrapped in a `ConnectionContext::retry()` closure, wherever thrown, to ensure the connection is retried up to the number of times the user requested, calling any exception handlers set by the user or resource. If the underlying client or driver does not throw exceptions, ensure error conditions are trapped and converted to exceptions. +It is important to define a canonical order for hashed inputs such that identical state presented in different orders does not create different hash values. For example, we might sort HTTP headers alphabetically before hashing because header order is not significant and reordering headers should not produce different output. -To promote ordinary exceptions to recoverable exceptions, wrap the fetch code in a try-catch block and pass the original exception into `RecoverableConnectorException` as its inner exception, as shown in the following example. +#### Durability -```php -public function fetch(ConnectionContext $context, $source) -{ - return $context->retry(function () use ($source) { - try { - return $this->client->fetch($source); - } catch (Exception $e) { - throw new RecoverableConnectorException($e->getMessage(), $e->getCode(), $e); - } - } -} -``` +To support Porter's durability features a connector may throw a subclass of `RecoverableConnectorException` to signal that the fetch operation can be retried. Execution will halt as normal if any other exception type is thrown. It is recommended to throw a recoverable exception type when the fetch operation is idempotent. Requirements ------------ @@ -506,12 +467,12 @@ Limitations Testing ------- -Porter is fully unit tested. Run the tests with the `composer test` command. +Porter is fully unit tested. Run the tests with the `composer test` command. Run mutation tests with the `composer mutation` command. Contributing ------------ -Everyone is welcome to contribute anything, from [ideas and issues][Issues] to [documentation and code][PRs]! For inspiration, consider the list of open [issues][Issues]. +Everyone is welcome to contribute anything, from [ideas and issues][Issues] to [documentation and code][PRs]! For inspiration, consider the list of open [issues][]. License ------- @@ -534,8 +495,6 @@ Porter is published under the open source GNU Lesser General Public License v3.0 [MSI image]: https://badge.stryker-mutator.io/github.com/ScriptFUSION/Porter/master [Coverage]: https://codecov.io/gh/ScriptFUSION/Porter [Coverage image]: https://codecov.io/gh/ScriptFUSION/Porter/branch/master/graphs/badge.svg "Test coverage" - [Style]: https://styleci.io/repos/49824895 - [Style image]: https://styleci.io/repos/49824895/shield?style=flat "Code style" [Issues]: https://github.com/ScriptFUSION/Porter/issues [PRs]: https://github.com/ScriptFUSION/Porter/pulls @@ -546,6 +505,7 @@ Porter is published under the open source GNU Lesser General Public License v3.0 [Stripe provider]: https://github.com/Provider/Stripe [ECB provider]: https://github.com/Provider/European-Central-Bank [Steam provider]: https://github.com/Provider/Steam + [HttpConnector]: https://github.com/Porter-connectors/HttpConnector [MappingTransformer]: https://github.com/Porter-transformers/MappingTransformer [Sub-imports]: https://github.com/Porter-transformers/MappingTransformer#sub-imports [Mapper]: https://github.com/ScriptFUSION/Mapper @@ -560,3 +520,5 @@ Porter is published under the open source GNU Lesser General Public License v3.0 [ECB]: https://github.com/Provider/European-Central-Bank [CurrencyRecords]: https://github.com/Provider/European-Central-Bank/blob/master/src/Records/CurrencyRecords.php [ECB test]: https://github.com/Provider/European-Central-Bank/blob/master/test/DailyForexRatesTest.php + [Amp]: https://amphp.org + [Amp documentation]: https://amphp.org/amp/ diff --git a/composer.json b/composer.json index a85742a..e1f6e75 100644 --- a/composer.json +++ b/composer.json @@ -20,7 +20,7 @@ "require-dev": { "amphp/phpunit-util": "^1.1", "infection/infection": "^0.13", - "mockery/mockery": "^1.1", + "mockery/mockery": "^1.3", "phpunit/phpunit": "^7.1.3" }, "suggest" : { diff --git a/src/Transform/FilterTransformer.php b/src/Transform/FilterTransformer.php index d16e433..7449b78 100644 --- a/src/Transform/FilterTransformer.php +++ b/src/Transform/FilterTransformer.php @@ -31,7 +31,7 @@ public function __construct(callable $filter) public function transform(RecordCollection $records, $context): RecordCollection { - $filter = static function ($predicate) use ($records, $context) { + $filter = static function ($predicate) use ($records, $context): \Generator { foreach ($records as $record) { if ($predicate($record, $context)) { yield $record; @@ -45,7 +45,7 @@ public function transform(RecordCollection $records, $context): RecordCollection public function transformAsync(AsyncRecordCollection $records, $context): AsyncRecordCollection { return new AsyncFilteredRecords( - new Producer(function (\Closure $emit) use ($records) { + new Producer(function (\Closure $emit) use ($records): \Generator { while (yield $records->advance()) { if (($this->filter)($record = $records->getCurrent())) { yield $emit($record);