Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing TextClassificationEvaluator to Support Averaged Metrics #596

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ilyesdjerfaf
Copy link

@ilyesdjerfaf ilyesdjerfaf commented Jun 5, 2024

Enhancing TextClassificationEvaluator to Support Averaged Metrics

Description

Previously, when creating an Evaluator object and using the compute method, it was not possible to specify the average parameter for selected metrics.

This limitation made the compute method of TextClassificationEvaluator applicable only for binary classification scenarios.

To address this, we have reworked the Evaluator and TextClassificationEvaluator classes to allow the inclusion of these parameters, with respect to the functionalities of the classes that inherit from Evaluator.

Originally, in the Evaluator class, a static METRIC_KWARGS attribute was initialized as empty and unpacked during the compute_metric usage, which was largely ineffective and did not allow for the control of metric configurations. Here is a snapshot of the previous implementation:

class Evaluator(ABC):
    ...
    METRIC_KWARGS = {}

    def compute_metric(...):
        result = metric.compute(metric_inputs, **self.METRIC_KWARGS)

Modifications

We introduced a new argument metrics_kwargs, utilized in prepare_metric, compute_metric, and finally in compute.

This enhancement allows for the dynamic specification of metric parameters in the compute method of an EvaluationModule.

Now, multi-class classification can be evaluated as follows:

metrics_kwargs = {
    "f1": [{"average": "macro"}, {"average": "weighted"}, {"average": "micro"}],
    "precision": [{"average": "macro"}, {"average": "weighted"}, {"average": "micro"}],
    "recall": [{"average": "macro"}, {"average": "weighted"}, {"average": "micro"}],
    "accuracy": {},
    "confusion_matrix": {}
}

from evaluate import evaluator
task_evaluator = evaluator("text-classification")
...
eval_results = task_evaluator.compute(# old params,
                                      metrics_kwargs=metrics_kwargs)

The code has been reviewed and is fully functional, developed in compliance with the flake8 plugin, ensuring clean and maintainable code.

Contributors

ilanaliouchouche and others added 2 commits May 17, 2024 20:58
Co-authored-by: Ilan Aliouchouche <ilan.aliouchouche@universite-paris-saclay.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants