Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves validation performance of types defined in the spec #286

Merged
merged 1 commit into from
Sep 25, 2023

Conversation

popematt
Copy link
Contributor

@popematt popematt commented Sep 22, 2023

Issue #, if available:

None

Description of changes:

I discovered recently that the types defined in the spec are (subjectively) slow-ish for validation. I tried an alternative implementation, and ended with a pretty good improvement. Serendipitously, I discovered that this fixes #284 and partially #169.

This change:

  • Completely rewrites TypeBuiltin
  • Adds BuiltInTypes, an object for holding all TypeBuiltin instances
  • Removes, SchemaCore, TypeBuiltInImpl, TypeIon, and TypeCore.
  • Adds tests for TypeNullable

My informal performance testing revealed some really big improvements. I chose some representative spec types that corresponded to 0, 1, "many", and "all" Ion types, and I created test cases that used the spec types to validate some sample data (one value of each Ion type except for clob and sexp). These results were measured by running warmups until the time was stable, using an "autorange" approach to determine the number of invocations in each sample, and then collecting and averaging the ops/s for 30 samples.

Type Before (ops/s) After (ops/s) Difference
nothing 134126 14928349 +11030%
any 153279 44165490 +28713%
$any 135234 57456489 +42386%
number 657438 11096099 +1587%
$number 620417 12594188 +1930%
int 2043462 6880997 +236%
$int 1957613 9165562 +368%

I tested with both ISL 1.0 and 2.0, and the results were basically the same, but this is especially impactful for ISL 1.0, because any user type definition that does not have the type constraint has an implied type: any.

Edit:

This PR also removes some unnecessary noise in the violations messages. E.g. when testing null.symbol against type any, the current message is a lot of noise, but the important thing is "not applicable for null values", which is rather cryptic.

Validation failed:
- expected type any
  - value matches none of the types
    - expected type blob, found symbol
    - expected type bool, found symbol
    - expected type clob, found symbol
    - expected type decimal, found symbol
    - expected type document, found symbol
    - expected type float, found symbol
    - expected type int, found symbol
    - expected type string, found symbol
    - not applicable for null values
    - expected type timestamp, found symbol
    - expected type list, found symbol
    - expected type sexp, found symbol
    - expected type struct, found symbol

The new violation message is a one liner that mentions the null-ness and the Ion type of the value.

Validation failed:
- expected type any, found null symbol

Related PRs in ion-schema, ion-schema-tests, ion-schema-schemas:

None.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@@ -64,7 +64,6 @@ class TypeTest {
val iter = violations.iterator()
val violation = iter.next()
assertEquals("type_mismatch", violation.code)
assertEquals("type", violation.constraint?.fieldName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Question) Why was this removed? I couldn't find any changes in Violation structure for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old way of creating the spec-defined types involved reading an Ion struct and then extracting the symbol from the struct. This meant that the symbols had the field name "type". See

private const val CORE_TYPES =
"""
{ type: blob }
{ type: bool }
{ type: clob }
{ type: decimal }
{ type: document }
{ type: float }
{ type: int }
{ type: string }
{ type: symbol }
{ type: timestamp }
{ type: list }
{ type: sexp }
{ type: struct }
"""
private const val ION_TYPES =
"""
{ type: ${'$'}blob }
{ type: ${'$'}bool }
{ type: ${'$'}clob }
{ type: ${'$'}decimal }
{ type: ${'$'}float }
{ type: ${'$'}int }
{ type: ${'$'}null }
{ type: ${'$'}string }
{ type: ${'$'}symbol }
{ type: ${'$'}timestamp }
{ type: ${'$'}list }
{ type: ${'$'}sexp }
{ type: ${'$'}struct }
"""

Now we create the symbol directly instead, which means that the symbol no longer has a field name. (This whole thing is a weird consequence of the unfortunate decision that an IonValue knows its own field name.) Anyway, this lack of field name causes no difference in the violation messages, and the violation constraint field name is not actually used anywhere in the library.

@popematt popematt merged commit bf5ecd8 into amazon-ion:master Sep 25, 2023
4 of 5 checks passed
@popematt popematt deleted the hard_coded_types branch September 25, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StackOverflow when validating 'null.string' but not null.
2 participants