Skip to content

Commit

Permalink
README: correct
Browse files Browse the repository at this point in the history
  • Loading branch information
ebonnal committed Dec 17, 2023
1 parent accd30e commit 1490c1c
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 12 deletions.
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

[![Actions Status](https://github.com/bonnal-enzo/kioss/workflows/test/badge.svg)](https://github.com/bonnal-enzo/kioss/actions) [![Actions Status](https://github.com/bonnal-enzo/kioss/workflows/PyPI/badge.svg)](https://github.com/bonnal-enzo/kioss/actions)

Ease the **development of ETL/EL/ReverseETL** jobs.
Library to **develop of ETL/EL/ReverseETL** scripts.

## 1. install

Expand Down Expand Up @@ -194,23 +194,23 @@ You can additionally provide a `when` argument: a function that takes the parent
# ***Typical use case for `kioss` in Data Engineering***
![](./img/dataeng.gif)

As a data engineer, you often need to write python scripts to do **ETL** (extract the data from some source API, apply some transformation and load it into a data warehouse) or **EL** (same with minimal transformation) or **Reverse ETL** (read data from data warehouse and post it into some destination API).
As a data engineer, you often need to write python scripts to do **ETL** (*Extract* the data from a source API, *Transform* and *Load* it into the data warehouse) or **EL** (same but with minimal transformation) or **Reverse ETL** (read data from the data warehouse and post it into a destination API).

These scripts **do not manipulate huge volumes** of data because they are scheduled to run periodically (using orchestrators like *Airflow/DAGster/Prefect*), and only manipulates the data produced or updated during that period. At worst if you are *Amazon*-sized business you may need to process 10 millions payment transactions every 10 minutes.
These scripts **do not manipulate huge volumes** of data because they are scheduled to run periodically (using orchestrators like *Airflow/DAGster/Prefect*) and only manipulates the data produced or updated during that period. At worst if you are *Amazon*-sized business you may need to process 10 millions payment transactions every 10 minutes.

These scripts tend to be replaced in part by EL tools like *Airbyte*, but sometimes you still need **custom integration logic**.

These scripts are typically composed of:
- the definition of a data **source** that may use:
- a client library: e.g. the `stripe` or `google.cloud.bigquery` modules.
- a custom `Iterator` that loops over the pages of a REST API and yields `Dict[str, Any]` json responses.
- The definition of a data **source** that may use:
- A client library: e.g. the `stripe` or `google.cloud.bigquery` modules.
- A custom `Iterator` that loops over the pages of a REST API and yields `Dict[str, Any]` json responses.
- ...

- The **transformation** functions, that again may involve to call APIs.

- The function to post into a **destination** that may use:
- a client library
- the `requests` module
- A client library.
- The `requests` module.

- The logic to **batch** some records together: it will often costs less to POST several records at once to an API.

Expand All @@ -224,7 +224,7 @@ These scripts are typically composed of:

The ambition of `kioss` is to help us write these type of scripts in a **DRY** (Don't Repeat Yourself), **flexible**, **robust** and **readable** way.

Let's delve into an example to gain a better understanding of what a job powered by kioss entails!
Let's delve into an example to gain a better understanding of what a job using `kioss` entails!

## 1. imports
```python
Expand All @@ -236,7 +236,7 @@ from typing import Iterable, Iterator, Dict, Any
```

## 2. source
define your source `Iterable`:
Define your source `Iterable`:

```python
class PokemonCardPageSource(Iterable[List[Dict[str, Any]]]):
Expand Down Expand Up @@ -270,7 +270,7 @@ def raise_for_errors(dct: Dict[str, Any]) -> None:
raise RuntimeError(f"Errors occurred: {errors}")
```

also let's init a BQ client:
Also let's init a BQ client:
```python
bq_client = bigquery.Client(project)
```
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@
license='Apache 2.',
author='bonnal-enzo',
author_email='bonnal.enzo.dev@gmail.com',
description='Keep I/O Simple and Stupid: Library providing a expressive Iterator-based interface to write ETL pipelines.'
description='Keep I/O Simple and Stupid: Library to **develop of ETL/EL/ReverseETL** scripts.'
)

0 comments on commit 1490c1c

Please sign in to comment.