about benchmark #24

sloev · 2023-03-15T22:03:14Z

Hi
I am the maintainer of another spacy pipeline sentiment library and i am trying to figure out how to benchmark spacy sentiment models fairly.

i have written something here https://github.com/sloev/sentimental-onix/tree/main/benchmark
it uses this dataset https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences as foundation for a benchmark.

my issue is that both spacytextblob and my library outputs floating points but in order to validate against a test dataset i am trying to threshold our values into descrete labels neg, neu, pos.
but whether it turns out to be a fair comparison is hard for me to evaluate.

results as they are (my model uses Onnx based sentiment model, and a default threshold of neg < -0.7 < neu < 0.7 < pos)

are:

library	result
spacytextblob	58.9%
sentimental_onix	69%

kind regards

SamEdwardes · 2023-03-16T20:59:05Z

Thank you for sharing! It could be more fair to compare the accuracy across the value of the floating points. For example, when the prediction is 0.9, we would hope that it is almost always correct. When it is 0.4, we would expect it to be wrong more often

A plot like this could be a more fair comparison, showing how good the models are based on different thresholds.

sloev · 2023-03-18T00:25:55Z

hi @SamEdwardes that is an AWESOME idea :-) i will definitly try that out and report back to you! i might ask for clarification if i run my head into the wall ;-)

have a great weekend!

sloev mentioned this issue Mar 18, 2023

make new benchmark based on floating point threshold and "correctness" sloev/sentimental-onix#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about benchmark #24

about benchmark #24

sloev commented Mar 15, 2023

SamEdwardes commented Mar 16, 2023

sloev commented Mar 18, 2023

about benchmark #24

about benchmark #24

Comments

sloev commented Mar 15, 2023

SamEdwardes commented Mar 16, 2023

sloev commented Mar 18, 2023