Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add displaCy data structures to docs #12202

Closed
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions website/docs/api/top-level.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,127 @@ use with the `manual=True` argument in `displacy.render`.
| `options` | Span-specific visualisation options. ~~Dict[str, Any]~~ |
| **RETURNS** | Generated entities keyed by text (original text) and ents. ~~dict~~ |

### Visualizer data structures {id="displacy_structures"}

You can use displaCy's data format to manually render data. This can be useful
if you want to visualize output from other libaries. You can find examples of
displaCy's different data formats below.

> #### DEP example data structure
>
> ```json
> {
> "words": [
> { "text": "This", "tag": "DT" },
> { "text": "is", "tag": "VBZ" },
> { "text": "a", "tag": "DT" },
> { "text": "sentence", "tag": "NN" }
> ],
> "arcs": [
> { "start": 0, "end": 1, "label": "nsubj", "dir": "left" },
> { "start": 2, "end": 3, "label": "det", "dir": "left" },
> { "start": 1, "end": 3, "label": "attr", "dir": "right" }
> ]
> }
> ```

#### Dependency Visualizer data structure {id="structure-dep"}

| Dictionary Key | Description |
| -------------- | ----------------------------------------------------------------------------------------------------------- |
| `words` | List of dictionaries describing a word token (see structure below). ~~List[Dict[str, Any]]~~ |
| `arcs` | List of dictionaries describing the relations between words (see structure below). ~~List[Dict[str, Any]]~~ |
| `settings` | Dependency Visualizer options (see [here](/api/top-level#displacy_options)). ~~Dict[str, Any]~~ |

<Accordion title="Words data structure">

| Dictionary Key | Description |
| -------------- | ---------------------------------------- |
| `text` | Text content of the word. ~~str~~ |
| `tag` | Fine-grained part-of-speech. ~~str~~ |
| `lemma` | Base form of the word. ~~Optional[str]~~ |

</Accordion>

<Accordion title="Arcs data structure">

| Dictionary Key | Description |
| -------------- | ---------------------------------------------------- |
| `start` | The index of the starting token. ~~int~~ |
| `end` | The index of the ending token. ~~int~~ |
| `label` | The type of dependency relation. ~~str~~ |
| `dir` | Direction of the relation (`left`, `right`). ~~str~~ |

</Accordion>

> #### ENT example data structure
>
> ```json
> {
> "text": "But Google is starting from behind.",
> "ents": [{"start": 4, "end": 10, "label": "ORG"}],
> "title": None
> }
> ```

#### Named Entity Recognition data structure {id="structure-ent"}

| Dictionary Key | Description |
| -------------- | ------------------------------------------------------------------------------------------- |
| `text` | String representation of the document text. ~~str~~ |
| `ents` | List of dictionaries describing entities (see structure below). ~~List[Dict[str, Any]]~~ |
| `title` | Title of the visualization. ~~str~~ |
| `settings` | Entity Visualizer options (see [here](/api/top-level#displacy_options)). ~~Dict[str, Any]~~ |

<Accordion title="Ents data structure">

| Dictionary Key | Description |
| -------------- | --------------------------------------------------- |
| `start` | The index of the first token of the entity. ~~int~~ |
| `end` | The index of the last token of the entity. ~~int~~ |
| `label` | Label attached to the entity. ~~str~~ |
| `kb_id` | `KnowledgeBase` ID. ~~str~~ |
| `kb_url` | `KnowledgeBase` URL. ~~str~~ |
thomashacker marked this conversation as resolved.
Show resolved Hide resolved

</Accordion>

> #### SPAN example data structure
>
> ```json
> {
> "text": "Welcome to the Bank of China.",
> "spans": [
> { "start_token": 3, "end_token": 6, "label": "ORG" },
> { "start_token": 5, "end_token": 6, "label": "GPE" }
> ],
> "tokens": ["Welcome", "to", "the", "Bank", "of", "China", "."]
> }
> ```

#### Span Classification data structure {id="structure-span"}

| Dictionary Key | Description |
| -------------- | ----------------------------------------------------------------------------------------- |
| `text` | String representation of the document text. ~~str~~ |
| `spans` | List of dictionaries describing spans (see structure below). ~~List[Dict[str, Any]]~~ |
| `title` | Title of the visualization. ~~str~~ |
| `tokens` | List of word tokens. ~~List[str]~~ |
| `settings` | Span Visualizer options (see [here](/api/top-level#displacy_options)). ~~Dict[str, Any]~~ |

<Accordion title="Spans data structure">

| Dictionary Key | Description |
| -------------- | ------------------------------------------------------------- |
| `start` | The index of the first token of the span. ~~int~~ |
| `end` | The index of the last token of the span. ~~int~~ |
| `start_token` | The index of the first token of the span in `tokens`. ~~int~~ |
| `end_token` | The index of the last token of the span in `tokens`. ~~int~~ |
| `label` | Label attached to the span. ~~str~~ |
| `kb_id` | `KnowledgeBase` ID. ~~str~~ |
| `kb_url` | `KnowledgeBase` URL. ~~str~~ |

</Accordion>

### Visualizer options {id="displacy_options"}

The `options` argument lets you specify additional settings for each visualizer.
Expand Down
3 changes: 2 additions & 1 deletion website/docs/usage/visualizers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,8 @@ or
[SyntaxNet](https://github.com/tensorflow/models/tree/master/research/syntaxnet).
If you set `manual=True` on either `render()` or `serve()`, you can pass in data
in displaCy's format as a dictionary (instead of `Doc` objects). There are
helper functions for converting `Doc` objects to displaCy's format for use with
helper functions for converting `Doc` objects to
[displaCy's format](/api/top-level#displacy_structures) for use with
`manual=True`: [`displacy.parse_deps`](/api/top-level#displacy.parse_deps),
[`displacy.parse_ents`](/api/top-level#displacy.parse_ents), and
[`displacy.parse_spans`](/api/top-level#displacy.parse_spans).
Expand Down