Total Pageviews

Thursday 17 March 2011

Precision and Recall

These are very confusing terms- precision and recall. You have to understand these terms completely before you are moving forward.

Say, you have 10 balls (6 white and 4 red balls) in a box. I know you are not colorblind but still somebody asked you to pick up the red balls from them. What you did is that you thought 7 balls as red, picked them from the box and put them in a tray. Among these 7 balls, you picked 2 red balls and 5 white balls (but you thought all of them are red).

Your precision in picking red ball is number of correct pick-ups/(number of correct pick-ups + number of wrong pick-ups) which is 2/(2+5) = 2/7 = 28% in this case. Now, look carefully that your denominator can also be like (total pick-ups).

Your recall in picking red ball is number of correct pick-ups/(number of correct pick-ups + number of red balls that you missed) which is 2/(2+2) = 2/4 = 50% in this case.

Now, what do they mean? Precision says how exact you were among your pick-ups. So, as you picked them up as red balls, you were 28% exact. Recall says, how complete you were among your pick-ups. So, as you picked them up as red balls, you were 50% complete in identifying all the red balls.

We learn at this point- Precision describes exactness and Recall describes completeness.

From the same example, we will now take a look how to combine various terminology with these simple examples.

Number of correct pick-ups can be said "true positives" as they were red ball that you picked up and you were asked to pick the red ones. The balls you picked as red but eventually are white can be called "false positives"- you thought they are positive but they are not.

So, if we modify this formula of precision with these terms, it turns into-

Precision = true positives / (true positives + false positives)

Again, the number of red balls you missed are thought as you missed them thinking them as white. So, they can be called "false negatives", which means you thought they are not red balls, but they are.

So, if we modify this formula of recall with this new terminology, it turns into-

Recall = true positives / (true positives + false negatives)

So, from this "re-written" version of recall formula can pop-up one thing: this is the "rate of true positives". In other words, from all the red balls, what percentage of red balls were grabbed. You had 4 red balls but you got 2 and missed 2: means you could took 50% of the red balls!

THIS IS WHAT YOU WILL FIND PRECISION AND RECALL IN THE REALM OF CLASSIFICATION PROBLEM.

Now, we will move to a different realm: INFORMATION RETRIEVAL. It will require sets, so we will change our scenario a little.

You have 10 files in a folder, 4 of which are about games and 6 of which are about weather. Now, somebody asked you to copy only the game files to another location. And what you did is you copied 7 files (thinking all of the 7 files you picked up is about games) and put them in a different location. But you picked only 2 game files and 5 weather files (but you thought they are all games files).

REMEMBER, THE ANALOGY HERE IS THE SAME AS THE RED BALL-WHITE BALL PROBLEM.

Now, your precision of copying games files will be:

Precision = Number of games file both in new location and in old location / number of files you copied.

In this case, which is 2/7 = 28% (the same as our previous example)

Now, your recall of copying games files will be:

Precision = Number of games file both in new location and in old location / total number of games files.

In this case, which is 2/4 = 50% (the same as our previous example)

Some more identical definitions or explanations of these two terms:

Precision
- A measure of the ability of a system to present only relevant items
- The fraction of correct instances among all instances that the algorithm believes to belong to the relevant set
- It is a measure of exactness or fidelity
- It tells how well a system weeds out what you don't want (Confused about this, but it is written in a document)
- Says nothing about the number of false negatives


Recall
- A measure of the ability of a system to present all relevant items
- The fraction of correct instances among all instances that actually belong to the relevant set
- It is a measure of completeness
- It tells how well a system performs to get what you want
- Says nothing about the number of false positives

3 comments:

  1. I have been going round and round. Thank you for explaining it with red and white balls. It made my life so simple.

    ReplyDelete
  2. Excellent explanation, gives great clarity.

    ReplyDelete
  3. i didn't comment at any forum or blog but your explanation force me to comment. its just owsm. thanks

    ReplyDelete