Improve benchmarking and performance measurements

The benchmark suite and evaluation tools haven't been used in a while, and it would be nice to have both run-time timing results and performance metrics taken as part of the chatter release process so we can look back over time and see if / how the classifiers change as we tweak the implementations and change training data.

This ticket is to build the infrastructure so that it's easy to add a new classifier for an existing task (eg: POS tagging, Chunking) as well as add new tasks (eg: Named Entity Recognition) and generate clear results that show false positives, false negatives, and true positives in a way that matches the behavior of  NLTK (for a clear point of comparison -- someone should be able to roughly compare chatter result numbers with other toolkits; I feel no particular attachment to NLTKs evaluation details, but I see no reason to invent our own).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve benchmarking and performance measurements #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve benchmarking and performance measurements #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions