Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen: Allow cutom go types via annotations #462

Open
0xjac opened this issue Sep 27, 2024 · 3 comments
Open

gen: Allow cutom go types via annotations #462

0xjac opened this issue Sep 27, 2024 · 3 comments

Comments

@0xjac
Copy link

0xjac commented Sep 27, 2024

Somewhat similar to #429 but more flexible and with keeping type annotations within the Avro schema, it would be great to generate a Go struct with custom types using annotations.

Specifically: add the ability to support specific annotations (go-type, go-key-type) similar to Java's but for go. The Go types would be expected to implement encoding.TextMarshaler/encoding.TextUnmarshaler (taking advantage of #68 and #327).

Example

Schema

record MyRecord {
  @go-type("math/big.Float") string value;
  @go-key-type("go.custom.com/ident.ID4") map<@go-type("math/big.Float") string> balances;
  array<@go-type("math/big.Float") string> values;
  @go-type("github.com/google/btree.BTreeG[int]") array<string> totals;
}
Which results in the following schema
{
  "type" : "record",
  "name" : "MyRecord",
  "fields" : [ {
    "name" : "value",
    "type" : {
      "type" : "string",
      "go-type" : "math/big.Float"
    }
  }, {
    "name" : "balances",
    "type" : {
      "type" : "map",
      "values" : {
        "type" : "string",
        "go-type" : "math/big.Float"
      },
      "go-key-type" : "go.custom.com/ident.ID4"
    }
  }, {
    "name" : "values",
    "type" : {
      "type" : "array",
      "items" : {
        "type" : "string",
        "go-type" : "math/big.Float"
      }
    }
  }, {
    "name" : "totals",
    "type" : {
      "type" : "array",
      "items" : "string",
      "go-type" : "github.com/google/btree.BTreeG[int]"
    }
  } ]
}

Gen cmd

avrogen -p main MyRecord.avsc

Actual Output

package main

// Code generated by avro/gen. DO NOT EDIT.

// MyRecord is a generated struct.
type MyRecord struct {
        Value    string            `avro:"value"`
        Balances map[string]string `avro:"balances"`
        Values   []string          `avro:"values"`
        Totals   []string          `avro:"totals"`
}

Desired Output

package main

import (
	"math/big"

	"github.com/google/btree"
	"go.custom.com/ident"
)

// Code generated by avro/gen. DO NOT EDIT.

// MyRecord is a generated struct.
type MyRecord struct {
	Value    big.Float               `avro:"value" json:"value"`
	Balances map[ident.ID4]big.Float `avro:"balances" json:"balances"`
	Values   []big.Float             `avro:"values" json:"values"`
	Totals   btree.BTreeG[int]       `avro:"totals" json:"totals"`
}

Notes

Potential pain points are:

  1. Extract the actual type and package to import from the fully qualified type in the go-type annotation.
    For complex cases, extra annotations should be considered... SQLC which also allows go type override handles this decently. Taking inspiration from their doc, this would define:
    1. go-type-import: Import path of the package.
    2. go-type-pkg: Package name if it doesn't match the import path.
    3. go-type-name: The actual Go type name
    4. go-type-ptr: Whether to use a pointer or the type directly.

      This could also be done with a ["null", T] union. But there might be some cases where specifying a union in Avro is not desirable, yet a pointer might be useful in Go.)

    5. Accordingly for map key type annotation, the corresponding go-key-type-import, go-key-type-pkg, go-key-type-name, go-key-type-ptr annotations.
  2. Generating clean import statements (without duplicates, in order and formatted correctly). It could be alleviated using goimports.
  3. The last annotation in the example (@go-type("github.com/google/btree.BTreeG[int]") array<string> totals;) is a bit more tricky as it is overwriting the array type. Marshaling to and from that type is not supported as it is not a string type supporting encoding.TextMarshaler/encoding.TextUnmarshaler. Overriding array (and map!) types be ignored (potentially with a warning/error) until marshaling can be handled for array, map or even arbitrary types.
@nrwiersma
Copy link
Member

This is an interesting concept. I wonder if any other Go lib has implemented this, to compare the proposed annotations against. I also wonder if the number of annotations could be reduced to just go-type and go-type-pkg but putting the import, name and pointer into go-type, eg. @go-type("github.com/hamba/avro/*Schema")?

@0xjac
Copy link
Author

0xjac commented Oct 8, 2024

I'm not aware of other libs doing type annotation for Avro. However I did not come up with it. I just took from the example for Java in the Avro specs, and adapted it for Go.

Regarding reducing the number of annotations, you can in most cases but it has some caveats which requires those extra ones for edge cases.

According to the Go spec, an import path can be any character. Thus it is a bit complicated to separate the import path from the rest without coming up with a mini markup language. I find it easier to have different annotations.

However a compiler "may also exclude the characters !"#$%&'()*,:;<=>?[\]^`{|} and the Unicode replacement character U+FFFD". I'm not sure if that's what go is actually doing, but in practice I have never seen a third part package which was not a URL and for most cases, we can have a simple logic to split the import path, package, type and pointer from a @go-type annotation, similar to SQLC.

This would look like: an optional * to indicate a pointer, the full import path which must end with the package name, a ., and the type, which looks like:

  • @go-type("*math/big.Int")
  • @go-type("github.com/hamba/avro/Schema")
  • @go-type("*github.com/hamba/avro/Schema")

This should work in most cases. However if for any reason, the package name is not the suffix of the import path, if we need to use an import alias (for example if we use data types from two "avro" libs) or any other weird edge case which may come up; we need to be able to specify everything (import path, pkg name, type, alias, ptr) explicitly.

@nrwiersma
Copy link
Member

The schema proposed for @go-type seems quiet good. Personally I also ways prefer starting in a simple place and dealing with edge cases as they arise and are concrete. I think it is clear that a second annotation like @go-type-pkg will be needed, and it is not uncommon for the package and import path to vary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants