Skip to content

feat: add new metric: Semantic R-precision#3

Open
shamira-venturini wants to merge 12 commits intouclanlp:mainfrom
shamira-venturini:main
Open

feat: add new metric: Semantic R-precision#3
shamira-venturini wants to merge 12 commits intouclanlp:mainfrom
shamira-venturini:main

Conversation

@shamira-venturini
Copy link

This pull request introduces Semantic R-Precision (SemR-p), a novel keyphrase evaluation metric designed to jointly assess semantic relevance and ranking quality. The metric, its motivation, and comprehensive evaluation are detailed in our paper: https://www.researchgate.net/publication/391552955_Meaning_in_Order_Order_in_Meaning_Semantic_R-precision_for_Keyphrase_Evaluation.

Summary of Changes:

  • Implemented Semantic R-Precision (SemR-p):
    • Added the core logic for SemR-p calculation within metrics/semantic_matching_metric.py.
    • SemR-p builds on the R-Precision framework and SemP, SemR and SemF1, incorporating exact stem matching and semantic similarity scoring (using Sentence Transformers and averaging over top_k references).
    • It is calculated when the semantic_matching metric group is run and uses the new top_k parameter (defaulting to 3) in the .gin config for SemanticMatchingMetric.
  • Added Data Retrieval Utility:
    * Included a new script doc_retriever.py to facilitate easy loading of source, target, and prediction data for specific document examples, by inputting dataset, modeland, doc_id, aiding qualitative analysis.

Design Rationale for SemR-p Integration:

SemR-p was integrated into SemanticMatchingMetric.py to efficiently reuse the existing Sentence Transformer embedding model infrastructure and align with KPEval's current approach of grouping metrics with the same underlying semantic similarity calculation method.

How to Use SemR-p (in this fork):

  • When running run_evaluation.py with metric_id='semantic_matching', SemR-p scores will be output under the key semantic_r_precision.
  • The top_k parameter for SemR-p can be configured in the .gin file within the SemanticMatchingMetric.top_k setting.

We believe these additions will be valuable to the KPEval toolkit and the broader keyphrase evaluation community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant