Skip to content

Finding urls having semantic similarity #10028

Discussion options

You must be logged in to vote

Are you trying to find if literally the URLs alone are similar, or do you mean the contents of the URLs?

If you want to decide if two things are similar just by looking at the URLs, that's going to be impossible pretty often I think. Can a human even do that? Like, what can you do with this:

https://my.bsomusic.org/overview/16895
https://tickets.coloradosymphony.org/5176

I can tell you they're about music from the domain, but not more than that.

I can see how you can get real words out of URLs, but even if you preprocess things the criteria you outlined seem pretty unclear. Also note that spaCy is mainly designed with complete sentences or longer documents in mind, and it can deal with …

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@imhans33
Comment options

@polm
Comment options

@imhans33
Comment options

@polm
Comment options

Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity
2 participants