I am Learning: Recall

I posted several articles explaining how precision and recall can be calculated, where F-Score is the equally weighted harmonic mean of them. I was wondering- how to calculate the average precision, recall and harmonic mean of them of a system if the system is applied to several sets of data.

Tricky, but I found this very interesting. There are two methods by which you can get such average statistic of information retrieval and classification.

1. Micro-average Method

In Micro-average method, you sum up the individual true positives, false positives, and false negatives of the system for different sets and the apply them to get the statistics. For example, for a set of data, the system's

True positive (TP1)= 12
False positive (FP1)=9
False negative (FN1)=3

Then precision (P1) and recall (R1) will be 57.14 and 80

and for a different set of data, the system's

True positive (TP2)= 50
False positive (FP2)=23
False negative (FN2)=9

Then precision (P2) and recall (R2) will be 68.49 and 84.75

Now, the average precision and recall of the system using the Micro-average method is

Micro-average of precision = (TP1+TP2)/(TP1+TP2+FP1+FP2) = (12+50)/(12+50+9+23) = 65.96
Micro-average of recall = (TP1+TP2)/(TP1+TP2+FN1+FN2) = (12+50)/(12+50+3+9) = 83.78

The Micro-average F-Score will be simply the harmonic mean of these two figures.

2. Macro-average Method

The method is straight forward. Just take the average of the precision and recall of the system on different sets. For example, the macro-average precision and recall of the system for the given example is

Macro-average precision = (P1+P2)/2 = (57.14+68.49)/2 = 62.82
Macro-average recall = (R1+R2)/2 = (80+84.75)/2 = 82.25

The Macro-average F-Score will be simply the harmonic mean of these two figures.

Suitability
Macro-average method can be used when you want to know how the system performs overall across the sets of data. You should not come up with any specific decision with this average.

On the other hand, micro-average can be a useful measure when your dataset varies in size.

These are very confusing terms- precision and recall. You have to understand these terms completely before you are moving forward.

Say, you have 10 balls (6 white and 4 red balls) in a box. I know you are not colorblind but still somebody asked you to pick up the red balls from them. What you did is that you thought 7 balls as red, picked them from the box and put them in a tray. Among these 7 balls, you picked 2 red balls and 5 white balls (but you thought all of them are red).

Your precision in picking red ball is number of correct pick-ups/(number of correct pick-ups + number of wrong pick-ups) which is 2/(2+5) = 2/7 = 28% in this case. Now, look carefully that your denominator can also be like (total pick-ups).

Your recall in picking red ball is number of correct pick-ups/(number of correct pick-ups + number of red balls that you missed) which is 2/(2+2) = 2/4 = 50% in this case.

Now, what do they mean? Precision says how exact you were among your pick-ups. So, as you picked them up as red balls, you were 28% exact. Recall says, how complete you were among your pick-ups. So, as you picked them up as red balls, you were 50% complete in identifying all the red balls.

We learn at this point- Precision describes exactness and Recall describes completeness.

From the same example, we will now take a look how to combine various terminology with these simple examples.

Number of correct pick-ups can be said "true positives" as they were red ball that you picked up and you were asked to pick the red ones. The balls you picked as red but eventually are white can be called "false positives"- you thought they are positive but they are not.

So, if we modify this formula of precision with these terms, it turns into-

Precision = true positives / (true positives + false positives)

Again, the number of red balls you missed are thought as you missed them thinking them as white. So, they can be called "false negatives", which means you thought they are not red balls, but they are.

So, if we modify this formula of recall with this new terminology, it turns into-

Recall = true positives / (true positives + false negatives)

So, from this "re-written" version of recall formula can pop-up one thing: this is the "rate of true positives". In other words, from all the red balls, what percentage of red balls were grabbed. You had 4 red balls but you got 2 and missed 2: means you could took 50% of the red balls!

THIS IS WHAT YOU WILL FIND PRECISION AND RECALL IN THE REALM OF CLASSIFICATION PROBLEM.

Now, we will move to a different realm: INFORMATION RETRIEVAL. It will require sets, so we will change our scenario a little.

You have 10 files in a folder, 4 of which are about games and 6 of which are about weather. Now, somebody asked you to copy only the game files to another location. And what you did is you copied 7 files (thinking all of the 7 files you picked up is about games) and put them in a different location. But you picked only 2 game files and 5 weather files (but you thought they are all games files).

REMEMBER, THE ANALOGY HERE IS THE SAME AS THE RED BALL-WHITE BALL PROBLEM.

Now, your precision of copying games files will be:

Precision = Number of games file both in new location and in old location / number of files you copied.

In this case, which is 2/7 = 28% (the same as our previous example)

Now, your recall of copying games files will be:

Precision = Number of games file both in new location and in old location / total number of games files.

In this case, which is 2/4 = 50% (the same as our previous example)

Some more identical definitions or explanations of these two terms:

Precision
- A measure of the ability of a system to present only relevant items
- The fraction of correct instances among all instances that the algorithm believes to belong to the relevant set
- It is a measure of exactness or fidelity
- It tells how well a system weeds out what you don't want (Confused about this, but it is written in a document)
- Says nothing about the number of false negatives

Recall
- A measure of the ability of a system to present all relevant items
- The fraction of correct instances among all instances that actually belong to the relevant set
- It is a measure of completeness
- It tells how well a system performs to get what you want
- Says nothing about the number of false positives

I am Learning

Total Pageviews

Sunday, 21 August 2011

Micro- and Macro-average of Precision, Recall and F-Score

Thursday, 17 March 2011

Precision and Recall