Precision, Recall, F for sequence labeling task?

05/09/2020

Table of Contents

What’s matter with “normal" evaluation?

I wrote an evaluation for “binary classification task" as “normal".

In sequence labeling tasks, definitions are different from binary classification tasks, in the way of counting up True Positive/True Negative/False Positive/False Negative.

Thus, you need to calculate performance score in different way from binary classification tasks.

How could I do it easily?

No worry at all. Here is super easy python package to calculate the performance. It’s seqeval.

You just run “pip install seqeval" and follow the sample code in the seqeval Readme. That’s all!

So … what’s the definition of TP / FP / FN / TN?

You could refer definitions in seqeval source code on Github.

It’s like this way.

    performace_dict['TP'] = sum(y_t == y_p for y_t, y_p in zip(y_true, y_pred)
                                if ((y_t != 'O') or (y_p != 'O')))
    performace_dict['FP'] = sum(y_t != y_p for y_t, y_p in zip(y_true, y_pred))
    performace_dict['FN'] = sum(((y_t != 'O') and (y_p == 'O'))
                                for y_t, y_p in zip(y_true, y_pred))
    performace_dict['TN'] = sum((y_t == y_p == 'O')
                                for y_t, y_p in zip(y_true, y_pred))

performace_dict['TP'] = sum(y_t == y_p for y_t, y_p in zip(y_true, y_pred)

if ((y_t != 'O') or (y_p != 'O')))

performace_dict['FP'] = sum(y_t != y_p for y_t, y_p in zip(y_true, y_pred))

performace_dict['FN'] = sum(((y_t != 'O') and (y_p == 'O'))

for y_t, y_p in zip(y_true, y_pred))

performace_dict['TN'] = sum((y_t == y_p == 'O')

for y_t, y_p in zip(y_true, y_pred))

I write down here the meaning of above codes.

Here, “Gold" means “label annotated by human" and “Prediction" means “label predicted by a sequence labeling model"

TP: First both of Gold and Prediction is not “O" label. And then, if Gold is the same as Prediction, then count +1
FP: If Gold is NOT the same as Prediction, then count +1
FN: If Gold is NOT “O" and Prediction is “O", then count +1
TN: if both of Gold and Prediction is “O", then count +1

Natural Language processing,Programming and researchNLP,Python

Posted by blog_author