Total Pageviews

Thursday 24 March 2011

True Negatives and Accuracy

Hello folks! Welcome back! We have talked about Precision and Recall and eventually three terminologies appeared: True positives, false positives and false negatives. Today, we will talk about accuracy and hence, we need to know another term: True negative.

Let's go back to our old example.

You were given 10 balls in a box, 6 of which are white and 4 are red. You were asked to pick only the red balls from the box and you picked 7 balls- 2 of them are really red but 5 white balls fooled you.

Now, our true positives are 2, false positives are 5 and false negatives are 2.

A true negative is what you thought negative and really was negative: in our case, which is- you thought a ball as white in the box, and that ball is really white. So, number of true negatives for our example is:

True negatives = Total Balls - (True positives + False Positives + False Negatives)
= 10 - (2 + 5 + 2) = 1.

Why are we eager to know the number of true negatives? Well, because if we want to measure how accurate you were in picking up red balls, then we need to know your true negatives as well.

The formula for accuracy is

Accuracy = (True positives + True negatives) / (True positives + True negatives + False positives + False negatives)

In your case, your accuracy is equal o (2 + 1) / 10 = 30%.

Saturday 19 March 2011

Efficiency and Effectiveness

Do you use these terms as synonyms? You are wrong!

Efficiency is scientifically output / input. Efficiency is measured to maximize output with minimum resources. It refers to doing things in right way. Say, you produce 10 kgs of potatoes with 1 kg of fertilizer. To be efficient, you need to produce more potatoes with fertilizer measuring 1 kg or less.

Effectiveness, on the other hand, means doing the right thing. Say, you produce 10 kgs of potatoes with 1 kg of fertilizer. To be effective as a farmer, you need to produce more good potatoes than bad potatoes; you don't have to bother about your fertilizer.

Proactive and Reactive Research

Sometimes you hear from your supervisor- "You have to be proactive in research" or sometimes they ask you to be reactive. What does he mean?

Proactive activity means you are cautious about your future, you are planned to face problems in future, or you are prepared for something "bad". When you are saving money for facing troubles in future, it is said to be a proactive activity. So, proactive research means if you know a "situation" may occur in future, prepare yourself. To understand "situations" that may occur, you need to dig your problem and find all possible "situations" that may occur in future.

So, you are a proactive researcher when you analyze your problem and formulate solutions prior to some situations in future.

Reactive activity means a little bit carelessness: I will provide solutions when "situation" occurs or I will react when time comes. So, reactive researchers do not bother about future; they just do what they are meant to do and when they face problems, they try to find a solution.

Mostly, in my opinion, you need to have a blending characteristic to succeed in research: at times you need to proactive and sometimes being reactive will bring you success.

Thursday 17 March 2011

Precision and Recall

These are very confusing terms- precision and recall. You have to understand these terms completely before you are moving forward.

Say, you have 10 balls (6 white and 4 red balls) in a box. I know you are not colorblind but still somebody asked you to pick up the red balls from them. What you did is that you thought 7 balls as red, picked them from the box and put them in a tray. Among these 7 balls, you picked 2 red balls and 5 white balls (but you thought all of them are red).

Your precision in picking red ball is number of correct pick-ups/(number of correct pick-ups + number of wrong pick-ups) which is 2/(2+5) = 2/7 = 28% in this case. Now, look carefully that your denominator can also be like (total pick-ups).

Your recall in picking red ball is number of correct pick-ups/(number of correct pick-ups + number of red balls that you missed) which is 2/(2+2) = 2/4 = 50% in this case.

Now, what do they mean? Precision says how exact you were among your pick-ups. So, as you picked them up as red balls, you were 28% exact. Recall says, how complete you were among your pick-ups. So, as you picked them up as red balls, you were 50% complete in identifying all the red balls.

We learn at this point- Precision describes exactness and Recall describes completeness.

From the same example, we will now take a look how to combine various terminology with these simple examples.

Number of correct pick-ups can be said "true positives" as they were red ball that you picked up and you were asked to pick the red ones. The balls you picked as red but eventually are white can be called "false positives"- you thought they are positive but they are not.

So, if we modify this formula of precision with these terms, it turns into-

Precision = true positives / (true positives + false positives)

Again, the number of red balls you missed are thought as you missed them thinking them as white. So, they can be called "false negatives", which means you thought they are not red balls, but they are.

So, if we modify this formula of recall with this new terminology, it turns into-

Recall = true positives / (true positives + false negatives)

So, from this "re-written" version of recall formula can pop-up one thing: this is the "rate of true positives". In other words, from all the red balls, what percentage of red balls were grabbed. You had 4 red balls but you got 2 and missed 2: means you could took 50% of the red balls!

THIS IS WHAT YOU WILL FIND PRECISION AND RECALL IN THE REALM OF CLASSIFICATION PROBLEM.

Now, we will move to a different realm: INFORMATION RETRIEVAL. It will require sets, so we will change our scenario a little.

You have 10 files in a folder, 4 of which are about games and 6 of which are about weather. Now, somebody asked you to copy only the game files to another location. And what you did is you copied 7 files (thinking all of the 7 files you picked up is about games) and put them in a different location. But you picked only 2 game files and 5 weather files (but you thought they are all games files).

REMEMBER, THE ANALOGY HERE IS THE SAME AS THE RED BALL-WHITE BALL PROBLEM.

Now, your precision of copying games files will be:

Precision = Number of games file both in new location and in old location / number of files you copied.

In this case, which is 2/7 = 28% (the same as our previous example)

Now, your recall of copying games files will be:

Precision = Number of games file both in new location and in old location / total number of games files.

In this case, which is 2/4 = 50% (the same as our previous example)

Some more identical definitions or explanations of these two terms:

Precision
- A measure of the ability of a system to present only relevant items
- The fraction of correct instances among all instances that the algorithm believes to belong to the relevant set
- It is a measure of exactness or fidelity
- It tells how well a system weeds out what you don't want (Confused about this, but it is written in a document)
- Says nothing about the number of false negatives


Recall
- A measure of the ability of a system to present all relevant items
- The fraction of correct instances among all instances that actually belong to the relevant set
- It is a measure of completeness
- It tells how well a system performs to get what you want
- Says nothing about the number of false positives