Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IEV OpenData interface rather than scraping #5

Open
ronaldtse opened this issue Aug 21, 2018 · 3 comments
Open

Use IEV OpenData interface rather than scraping #5

ronaldtse opened this issue Aug 21, 2018 · 3 comments
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Member

ronaldtse commented Aug 21, 2018

Load IEV areas: (this is in JSON)

curl --header "Content-Type: application/json"  https://opendata-api.iec.ch/v1/opendata/areas

=>

[
  {"dateCreated":"","id":"101","description":"Mathematics"}, 
  {"dateCreated":"","id":"102","description":"Mathematics - General concepts and linear algebra"},
  ...
]

Load 101 Mathematics: (this is in XML)

curl --header "Content-Type: application/xml"  https://opendata-api.iec.ch/v1/opendata/iev/101/{yourKey}

=>

<?xml version="1.0" encoding="UTF-8"?>
<subjectarea version="1.0beta" id="101">
<concept ievref="101-12-01">
<lang-set lang-id="en">
<term-name>information</term-name>
<definition>knowledge concerning objects, such as facts, events, things, processes, or ideas, including concepts, that within a certain context has a particular meaning</definition>
<pubdate>1998-04</pubdate>
<source>ISO/IEC 2382-1, 01.01.01, 701-01-01 MOD</source>
...
@ronaldtse ronaldtse added the enhancement New feature or request label Aug 21, 2018
@ronaldtse
Copy link
Member Author

The IEV OpenData API doesn't quite work immediately for our case where we're importing only one term at once. Reasons below (also submitted as feedback to IEC Terminology team).

  1. Inconsistency with loading formats.

The query for /areas is in JSON, but the term results are returned in XML. In particular, the term results endpoint does not support JSON (it returns XML regardless of the format requested).

  1. Inability to load a particular term entry.

When using IEV terms in a standard document, it is most convenient to refer it using a “unique ID” (i.e. the IEV term ID like 101-12-09).

Currently, the OpenData API only provides a method to load all entries within an area, such as:

https://opendata-api.iec.ch/v1/opendata/iev/101/{yourKey}

This request will receive a response with all terms under the 101 area, which is very long and mostly useless to the user.

We hope there will be an additional endpoint like https://opendata-api.iec.ch/v1/opendata/iev/101/12-09/{yourKey} that will return a single term (and all its associated languages, or ability to load only one language).

  1. Grouping of concepts

The response of a “concept" is currently separately returned per language. However, the multiple languages of a term should be grouped under the same “concept”.

Currently it is:

<concept ievref="101-12-01">
<lang-set lang-id="en">
<term-name>information</term-name>
<definition>knowledge concerning objects, such as facts, events, things, processes, or ideas, including concepts, that within a certain context has a particular meaning</definition>
<pubdate>1998-04</pubdate>
<source>ISO/IEC 2382-1, 01.01.01, 701-01-01 MOD</source>
</lang-set>
</concept>

<concept ievref="101-12-01">
<lang-set lang-id="fr">
<term-name>information</term-name>
<attribute>f</attribute>
<definition>connaissance concernant un objet tel qu'un fait, un événement, une chose, un processus ou une idée, y compris une notion, et qui, dans un contexte déterminé, a une signification particulière</definition>
<pubdate>1998-04</pubdate>
<source>ISO/CEI 2382-1, 01.01.01, 701-01-01 MOD</source>
</lang-set>
</concept>
…

It would be better to be:

<concept ievref="101-12-01">

<lang-set lang-id="en">
<term-name>information</term-name>
<definition>knowledge concerning objects, such as facts, events, things, processes, or ideas, including concepts, that within a certain context has a particular meaning</definition>
<pubdate>1998-04</pubdate>
<source>ISO/IEC 2382-1, 01.01.01, 701-01-01 MOD</source>
</lang-set>

<lang-set lang-id="fr">
<term-name>information</term-name>
<attribute>f</attribute>
<definition>connaissance concernant un objet tel qu'un fait, un événement, une chose, un processus ou une idée, y compris une notion, et qui, dans un contexte déterminé, a une signification particulière</definition>
<pubdate>1998-04</pubdate>
<source>ISO/CEI 2382-1, 01.01.01, 701-01-01 MOD</source>
</lang-set>

</concept>

…

@ronaldtse
Copy link
Member Author

The IEV database structure is defined in IEC Directives Supplement Annex SK
(http://www.iec.ch/members_experts/refdocs/iec/isoiecdir-iecsup%7Bed11.0%7Den.pdf)

In the following descriptions, references are provided to the IEC Supplement, Annex SK, which gives the rules
for the structure and content of the Electropedia data, e.g. "[SK.3.1.2]".
version is the version of the XML schema
subject area is the title of the subject area (or IEV part)
concept is a container for one language version of the concept
id is the number of the subject area (or IEV part) [SK.2.1.3; SK.2.1.5]
lang-id is the ISO alpha-2 language code [SK.2.1.4]
ievref is the reference of the concept in the Electropedia [SK.2.1.5]
<term-name> is the preferred term designating the concept [SK.3.1.3]
<attribute> contains any attributes to the term [SK.3.1.3.4.2, SK.3.1.3.5.5, SK.3.1.3.5.6, SK.3.1.3.6]
<symbol> contains any symbols representing the concept [SK.3.1.2, SK.3.1.3]
<synonyms> is a container; a concept can contain up to 3 synonyms. Each synonym has an id, and is
defined by its name, its attribute and a status (Preferred, Admitted or Deprecated) [SK.3.1.3.4]
<definition> is the definition of the concept [SK.3.1.4]
<example> contains an example of the concept; it has an id, a label and content [SK.3.1.6]
<note> contains additional information that supplements the terminological data (e.g. information
regarding the units applicable to a quantity, provisions relating to the use of a term, an explanation of
the reasons for selecting an abbreviated form as preferred term. It has an id, a label, and content
[SK.3.1.7]
<source> contains the source reference from which a concept has been repeated, together with
information about any modifications made [SK.3.1.8]
<pubdate> is the date of publication date of the concept

@ronaldtse
Copy link
Member Author

The "opendata-api.iec.ch" host is gone. We need to ask IEC for an alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants