Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include uniqueness filter in noramlizeTokens #202

Open
phanos opened this issue Dec 2, 2019 · 0 comments
Open

include uniqueness filter in noramlizeTokens #202

phanos opened this issue Dec 2, 2019 · 0 comments

Comments

@phanos
Copy link

phanos commented Dec 2, 2019

function normalizeTokens(tokens) {

I was looking at issue #183 regarding starts with logic. This seems to be a result of how the Ngram tokenizer works. I was playing around with implementing a tokenizer that would output interword tokens. It would generate a lot more tokens. I noticed normalizeTokens does not return unique list tokens. It seems like getIntersection efficiency would benefit from being fed a unique list? Would there be any downsides to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant