Hello everyone! This is going to be a series of blog posts on language of scientific research papers. When you write a paper, there are many do's and don'ts to follow. If you practice, then definitely someday your paper will become immune from grammatical or typographical error. This is one of the strengths of research papers. However, when you learn how to write scientific papers and submit some papers for proofread to your supervisor, you may have heard this things from her- "Your paper is well written, I just need to change the STYLISTICS/ RHETORICS and/or I need to modify some of the languages". Well, what type of change they do? Did you ever notice? If not, this is the right place for you. From my experience, I am going to give some tips on how to improve your stylistics or rhetorics and hence the language of your paper. Remember, this is the second line of strength of your paper. The most important strength is your work.

As are

A very good term to start with. "As are" is used in the place of "which".

For example, "The non-character tokens, which are any tokens that do not contain letters, are deleted" is a fine sentence. Strong one. But what follows is a stronger sentence- "The non-character tokens are deleted, as are any tokens that do not contain letters".

Giveaway

Most common use of this term is replacing "disadvantage". For example, "The disadvantage of the algorithm is that it picks some garbage characters" can be re-written with much more strength as "The giveaway of the algorithm is that it picks some garbage characters".

Comparably

This is very possible that your algorithm does not defeat other benchmarks but performs almost similar or very close to them (or perhaps sometimes better and sometimes not). This is the right term to use in these cases- "Our approach performs comparably to the state-of-the-art."

It goes on

I use this when I describe my methods, other people methods or reference other peoples' work. "The paper of X et al. [1] goes on to state their performances against the gold standard"

Because

You know this word for sure! But I did not know that the word has powerful effect if you re-arrange the sentence "We could not achieve better F-score because the dataset was small" with "Because the dataset was small, we could not achieve better score."

As well as/As good as

This is a synonym of what we learnt already "comparably" as in the sentence "Our approach performs as well as/ as good as the benchmarks"

A Priori

A substitute for "beforehand". Sometimes you may like to state something like "There is no way to answer questions like this before calculating crop factor". This gets a professional essence when you use a priori. "There is no way to answer questions like this a priori calculating crop factor".

Among them

Exclusively used when you give any example. "The algorithm uses many parameters- among them x and y- but z" means that the algorithm uses parameters like x and y but not z.

Couch in

Used to illustrate "formulate in the same way". "Couched in the same terms as in arithmetic mean, geometric mean can be expressed in different way."

As well as

Simply, this means a conjunction "and". But also, this can be used at the beginning of a sentence. "It includes a wide range of processing tools and a variety of algorithms" can be written as either "It includes a wide range of processing tools as well as a variety of algorithms" or "As well as a variety of algorithms, it includes a wide range of processing tools".

Getting to know

To express a reason, why you do follow a method, this is a good word to pick. "As we know that the data is a part of the work, we developed data visualization tools" can be written as "Getting to know that the data is a part of the work, we developed data visualization tools".

Way

When you explain the ways of doing some thing, you can follow the pattern as follows "One way to use the tools is .... Another is .... A third is ......"

Close second

An excellent phrase to depict something important (but also not that important you previously mentioned about a thing beforehand). For example, "This is the most valuable tool in the package. A close second is its visualization capabilities".

(To be continued)

# I am Learning

## Monday, 23 January 2012

## Monday, 26 September 2011

### Statistical Significance: FAQ

Say, you have collected humidity of 5 days for city A and city B- that are adjacent.

City A = {40,45,42,50,42}

City B = {38,45,40,48,52}

The average humidity of City A is 43.8 and City B is 44.6.

Most of the times, we come to a decision that City B is more humid than City A by just taking a look at the average humidity. This is inappropriate. To tell this, we need to further investigate whether or not they are "statistically significant". Average is just a tool to assume not to claim.

To say that two classes significantly differ from each other, we need to test their "statistical significance". There are numbers of tools to test it. I am not discussing them here because one can google them and read more about it. What I am answering here can be seen as FAQs.

What is parametric and non-parametric tests?

If you know that your data of two classes follow normal distribution, then you can choose several significance tests that are parametric. If they don't then choose a non-parametric test.

Link to non-parametric test list

How do I know that my data of two classes follow normal distribution?

A novice approach can be to have class intervals and frequency of occurrence, and then a plot. The plot should contain the class intervals in x-axis and frequency in y-axis. If they form a Bell shaped curve, then your data is following normal distribution.

For in depth precise analysis, click here

Click here if you don't know what a Bell curve is

And to find several normality tests by which you can be confirmed that your data are normally distributed, click here

How do I determine whether I need a parametric test or non-parametric test?

1. If you know that your data follow normal distribution, use parametric test; non-parametric test otherwise.

2. Some values are extremely lower or higher and can even follow normal distribution. Use non-parametric test in this case.

3. If you are confused about the distribution of sample, try to look at the whole dataset rather than the sample.

4. Try to find out the sources that cause the data to scatter. If you have numbers of sources, then it is most probably following normal distribution.

5. If you have large dataset, you can try any one of this- from experiment, it is proved that both of the tests perform well on large dataset. In contrast, they are poor on small dataset.

Last but not the least, many people choose parametric tests as they are not confirmed if the data has lost following normal distribution and many people consider non-parametric tests as they are not sure if the data met the requirements to be normally distributed.

I have seen paired and unpaired tests- which is appropriate?

If you feel that the values of your dataset match with each other, you have to experiment with unpaired tests, paired tests otherwise.

Good, I have seen one-sided and two-sided p value also- can you tell me about them

First, tell me if you know what a null hypothesis is.

No, what is a null hypothesis?

A null hypothesis tells that there is no statistical significance between the two datasets. If you see their average is differing, they are differing by chance only.

Oh, okay, then tell me now about the one-sided and two-sided p value.

If the null hypothesis is true, the one-sided P value is the probability that two averages would differ as much as was observed or further (see the example, they differ, don't they?) in the direction specified by the hypothesis just by chance, even though the means of the overall populations are actually equal. The two-sided P value also includes the probability that the sample means would differ that much in the opposite direction (i.e., the other group has the larger mean). The two-sided P value is twice the one-sided P value.

So, when should I use them?

When you can state with certainty (and before collecting any data) that there either will be no difference between the means or that the difference will go in a direction you can specify in advance (i.e., you have specified which group will have the larger mean), you should use a one-sided p value during your test, otherwise select a two-sided P value.

1. If you select a one-sided test, you should do so before collecting any data

2. You need to state the direction of your experimental hypothesis.

3. If the data go in the "wrong" direction, then you should use a two-sided P value.

It is recommend that you always calculate a two-sided P value.

City A = {40,45,42,50,42}

City B = {38,45,40,48,52}

The average humidity of City A is 43.8 and City B is 44.6.

Most of the times, we come to a decision that City B is more humid than City A by just taking a look at the average humidity. This is inappropriate. To tell this, we need to further investigate whether or not they are "statistically significant". Average is just a tool to assume not to claim.

To say that two classes significantly differ from each other, we need to test their "statistical significance". There are numbers of tools to test it. I am not discussing them here because one can google them and read more about it. What I am answering here can be seen as FAQs.

What is parametric and non-parametric tests?

If you know that your data of two classes follow normal distribution, then you can choose several significance tests that are parametric. If they don't then choose a non-parametric test.

Link to non-parametric test list

How do I know that my data of two classes follow normal distribution?

A novice approach can be to have class intervals and frequency of occurrence, and then a plot. The plot should contain the class intervals in x-axis and frequency in y-axis. If they form a Bell shaped curve, then your data is following normal distribution.

For in depth precise analysis, click here

Click here if you don't know what a Bell curve is

And to find several normality tests by which you can be confirmed that your data are normally distributed, click here

How do I determine whether I need a parametric test or non-parametric test?

1. If you know that your data follow normal distribution, use parametric test; non-parametric test otherwise.

2. Some values are extremely lower or higher and can even follow normal distribution. Use non-parametric test in this case.

3. If you are confused about the distribution of sample, try to look at the whole dataset rather than the sample.

4. Try to find out the sources that cause the data to scatter. If you have numbers of sources, then it is most probably following normal distribution.

5. If you have large dataset, you can try any one of this- from experiment, it is proved that both of the tests perform well on large dataset. In contrast, they are poor on small dataset.

Last but not the least, many people choose parametric tests as they are not confirmed if the data has lost following normal distribution and many people consider non-parametric tests as they are not sure if the data met the requirements to be normally distributed.

I have seen paired and unpaired tests- which is appropriate?

If you feel that the values of your dataset match with each other, you have to experiment with unpaired tests, paired tests otherwise.

Good, I have seen one-sided and two-sided p value also- can you tell me about them

First, tell me if you know what a null hypothesis is.

No, what is a null hypothesis?

A null hypothesis tells that there is no statistical significance between the two datasets. If you see their average is differing, they are differing by chance only.

Oh, okay, then tell me now about the one-sided and two-sided p value.

If the null hypothesis is true, the one-sided P value is the probability that two averages would differ as much as was observed or further (see the example, they differ, don't they?) in the direction specified by the hypothesis just by chance, even though the means of the overall populations are actually equal. The two-sided P value also includes the probability that the sample means would differ that much in the opposite direction (i.e., the other group has the larger mean). The two-sided P value is twice the one-sided P value.

So, when should I use them?

When you can state with certainty (and before collecting any data) that there either will be no difference between the means or that the difference will go in a direction you can specify in advance (i.e., you have specified which group will have the larger mean), you should use a one-sided p value during your test, otherwise select a two-sided P value.

1. If you select a one-sided test, you should do so before collecting any data

2. You need to state the direction of your experimental hypothesis.

3. If the data go in the "wrong" direction, then you should use a two-sided P value.

It is recommend that you always calculate a two-sided P value.

## Sunday, 21 August 2011

### Micro- and Macro-average of Precision, Recall and F-Score

I posted several articles explaining how precision and recall can be calculated, where F-Score is the equally weighted harmonic mean of them. I was wondering- how to calculate the average precision, recall and harmonic mean of them of a system if the system is applied to several sets of data.

Tricky, but I found this very interesting. There are two methods by which you can get such average statistic of information retrieval and classification.

1. Micro-average Method

In Micro-average method, you sum up the individual true positives, false positives, and false negatives of the system for different sets and the apply them to get the statistics. For example, for a set of data, the system's

True positive (TP1)= 12

False positive (FP1)=9

False negative (FN1)=3

Then precision (P1) and recall (R1) will be 57.14 and 80

and for a different set of data, the system's

True positive (TP2)= 50

False positive (FP2)=23

False negative (FN2)=9

Then precision (P2) and recall (R2) will be 68.49 and 84.75

Now, the average precision and recall of the system using the Micro-average method is

Micro-average of precision = (TP1+TP2)/(TP1+TP2+FP1+FP2) = (12+50)/(12+50+9+23) = 65.96

Micro-average of recall = (TP1+TP2)/(TP1+TP2+FN1+FN2) = (12+50)/(12+50+3+9) = 83.78

The Micro-average F-Score will be simply the harmonic mean of these two figures.

2. Macro-average Method

The method is straight forward. Just take the average of the precision and recall of the system on different sets. For example, the macro-average precision and recall of the system for the given example is

Macro-average precision = (P1+P2)/2 = (57.14+68.49)/2 = 62.82

Macro-average recall = (R1+R2)/2 = (80+84.75)/2 = 82.25

The Macro-average F-Score will be simply the harmonic mean of these two figures.

Suitability

Macro-average method can be used when you want to know how the system performs overall across the sets of data. You should not come up with any specific decision with this average.

On the other hand, micro-average can be a useful measure when your dataset varies in size.

Tricky, but I found this very interesting. There are two methods by which you can get such average statistic of information retrieval and classification.

1. Micro-average Method

In Micro-average method, you sum up the individual true positives, false positives, and false negatives of the system for different sets and the apply them to get the statistics. For example, for a set of data, the system's

True positive (TP1)= 12

False positive (FP1)=9

False negative (FN1)=3

Then precision (P1) and recall (R1) will be 57.14 and 80

and for a different set of data, the system's

True positive (TP2)= 50

False positive (FP2)=23

False negative (FN2)=9

Then precision (P2) and recall (R2) will be 68.49 and 84.75

Now, the average precision and recall of the system using the Micro-average method is

Micro-average of precision = (TP1+TP2)/(TP1+TP2+FP1+FP2) = (12+50)/(12+50+9+23) = 65.96

Micro-average of recall = (TP1+TP2)/(TP1+TP2+FN1+FN2) = (12+50)/(12+50+3+9) = 83.78

The Micro-average F-Score will be simply the harmonic mean of these two figures.

2. Macro-average Method

The method is straight forward. Just take the average of the precision and recall of the system on different sets. For example, the macro-average precision and recall of the system for the given example is

Macro-average precision = (P1+P2)/2 = (57.14+68.49)/2 = 62.82

Macro-average recall = (R1+R2)/2 = (80+84.75)/2 = 82.25

The Macro-average F-Score will be simply the harmonic mean of these two figures.

Suitability

Macro-average method can be used when you want to know how the system performs overall across the sets of data. You should not come up with any specific decision with this average.

On the other hand, micro-average can be a useful measure when your dataset varies in size.

## Friday, 12 August 2011

### Research Writing: That or Which?

Very simple explanation on "that" and "which". But I would say- this one is the best clarification I found on the web so far. I am confident enough now to use either of the two as the papers I review mostly mix up them.

[Originally from Mignon Fogarty]

"

Restrictive Clause--That

A restrictive clause is just part of a sentence that you can't get rid of because it specifically restricts some other part of the sentence. Here's an example:

Gems that sparkle often elicit forgiveness.

The words that sparkle restrict the kind of gems you're talking about. Without them, the meaning of the sentence would change. Without them, you'd be saying that all gems elicit forgiveness, not just the gems that sparkle. (And note that you don't need commas around the words that sparkle).

Nonrestrictive Clause--Which

A nonrestrictive clause is something that can be left off without changing the meaning of the sentence. You can think of a nonrestrictive clause as simply additional information. Here's an example:

Diamonds, which are expensive, often elicit forgiveness.

Leaving out the words which are expensive doesn't change the meaning of the sentence. (Also note that the phrase is surrounded by commas. Nonrestrictive clauses are usually surrounded by, or preceded by, commas.

"

[Originally from Mignon Fogarty]

"

Restrictive Clause--That

A restrictive clause is just part of a sentence that you can't get rid of because it specifically restricts some other part of the sentence. Here's an example:

Gems that sparkle often elicit forgiveness.

The words that sparkle restrict the kind of gems you're talking about. Without them, the meaning of the sentence would change. Without them, you'd be saying that all gems elicit forgiveness, not just the gems that sparkle. (And note that you don't need commas around the words that sparkle).

Nonrestrictive Clause--Which

A nonrestrictive clause is something that can be left off without changing the meaning of the sentence. You can think of a nonrestrictive clause as simply additional information. Here's an example:

Diamonds, which are expensive, often elicit forgiveness.

Leaving out the words which are expensive doesn't change the meaning of the sentence. (Also note that the phrase is surrounded by commas. Nonrestrictive clauses are usually surrounded by, or preceded by, commas.

"

## Tuesday, 9 August 2011

### He as well as Me or He as well as I?

Found a very useful article (don't know the name of the poster in a language forum, but I am acknowledging him/ her with the deepest).

"as well as" functions as a conjunction in #1, not a preposition:

#1. She was into drama and took part in many youth theater productions as well as [took part in] singing in choirs.

"as well as" has two functions:

conjunction: courageous as well as strong.

preposition: The rhetoric, as well as the reasoning, is appreciated.

Notice the commas on each side of the prepositional phrase. They set off or bar the grammar from counting it as part of the subject. That's why the verb is singular "is", and not plural "are". Take the commas away and the prepositional phrase changes identity. It becomes a conjunction + noun phrase that's counted as part of the subject:

conjunction: The rhetoric as well as the reasoning are appreciated.

Below in #2a, there aren't any commas setting off "as well as" from the grammar, so it's counted as part of the subject. "He as well as I" is a compound subject so the verb should be plural "are" (#2b), not singular "is":

#2a. He as well as I is satisfied with the result.

#2b. He as well as I are satisfied with the result.

Subject verb agreement is also a problem for #3a. "He as well as me" is a compound subject; the verb should be plural:

#3a. He as well as me is satisfied with the result.

#3b. He as well as me are satisfied with the result.

Now, add in the commas and "as well as" functions as a preposition,

#2c. He, as well as I, is satisfied with the result.

#3c. He, as well as me, is satisfied with the result.

As a conjunction, "as well as" joins two like forms;i.e., courageous as well as strong; you as well as Sam, but in #3b, below, "as well as" joins two unlike forms, the subject pronoun "He" and the object pronoun "me".

#3b. He as well as me are satisfied with the result.

#3d. He as well as I are satisfied with the result.

Now, "as well as me" is non-standard English, but nevertheless speakers will use "me" as well as "myself" as a way of placing the other person above them. It's a way of humbling oneself.

"as well as" functions as a conjunction in #1, not a preposition:

#1. She was into drama and took part in many youth theater productions as well as [took part in] singing in choirs.

"as well as" has two functions:

conjunction: courageous as well as strong.

preposition: The rhetoric, as well as the reasoning, is appreciated.

Notice the commas on each side of the prepositional phrase. They set off or bar the grammar from counting it as part of the subject. That's why the verb is singular "is", and not plural "are". Take the commas away and the prepositional phrase changes identity. It becomes a conjunction + noun phrase that's counted as part of the subject:

conjunction: The rhetoric as well as the reasoning are appreciated.

Below in #2a, there aren't any commas setting off "as well as" from the grammar, so it's counted as part of the subject. "He as well as I" is a compound subject so the verb should be plural "are" (#2b), not singular "is":

#2a. He as well as I is satisfied with the result.

#2b. He as well as I are satisfied with the result.

Subject verb agreement is also a problem for #3a. "He as well as me" is a compound subject; the verb should be plural:

#3a. He as well as me is satisfied with the result.

#3b. He as well as me are satisfied with the result.

Now, add in the commas and "as well as" functions as a preposition,

#2c. He, as well as I, is satisfied with the result.

#3c. He, as well as me, is satisfied with the result.

As a conjunction, "as well as" joins two like forms;i.e., courageous as well as strong; you as well as Sam, but in #3b, below, "as well as" joins two unlike forms, the subject pronoun "He" and the object pronoun "me".

#3b. He as well as me are satisfied with the result.

#3d. He as well as I are satisfied with the result.

Now, "as well as me" is non-standard English, but nevertheless speakers will use "me" as well as "myself" as a way of placing the other person above them. It's a way of humbling oneself.

## Thursday, 4 August 2011

### Excel Graph to EPS (MS Office 2007)

Here is how I convert an excel (MS Office 2007) graph to EPS so that I can use that graph in a TEX file.

1. Open MS Excel. Copy the graph to a new sheet and be careful so that it fits in one page (you can double check if the graph fits in a page from print preview option).

2. Go to File-> Print-> Properties-> Advanced-> Postscript Option and select EPS.

3. The file will be saved as EPS so give the file an extension .eps

4. Open the GSview and open the .eps file you just saved. Go to File and select ps to eps. The file should be given an extension of .eps. This is the final eps file that you can insert in your TEX code.

1. Open MS Excel. Copy the graph to a new sheet and be careful so that it fits in one page (you can double check if the graph fits in a page from print preview option).

2. Go to File-> Print-> Properties-> Advanced-> Postscript Option and select EPS.

3. The file will be saved as EPS so give the file an extension .eps

4. Open the GSview and open the .eps file you just saved. Go to File and select ps to eps. The file should be given an extension of .eps. This is the final eps file that you can insert in your TEX code.

### From Tex to PDF

There are several ways to generate a PDF from a TEX file. I am stating the most popular 4 ways here.

Method 1

If you do not have bibliography file

% latex myfile (to generate myfile.dvi from myfile.tex)

% dvips myfile (to generate myfile.ps from myfile.dvi)

% ps2pdf myfile.ps (to generate the file myfile.pdf)

Method 2

If you have bibliography file

% latex myfile (to generate myfile.dvi from myfile.tex)

% bibtex myfile (uses the .aux file to extract cited publications from the database in the .bib file, formats them according to the indicated style, and puts the results into in a .bbl file)

% dvips myfile (to generate myfile.ps from myfile.dvi)

% ps2pdf myfile.ps (to generate the file myfile.pdf)

Method 3

If you want to convert a TEX file directly to PDF and do not have a bibliography file

% pdflatex myfile

N.B. If you have images in EPS format, you need to convert it into PDF format with the following command-

% epstopdf image.eps

Method 4

If you want to convert a TEX file directly to PDF and have a bibliography file

% pdflatex myfile

% bibtex myfile

% pdflatex myfile

% pdflatex myfile

N.B. If you have images in EPS format, you need to convert it into PDF format with the following command-

% epstopdf image.eps

Method 1

If you do not have bibliography file

% latex myfile (to generate myfile.dvi from myfile.tex)

% dvips myfile (to generate myfile.ps from myfile.dvi)

% ps2pdf myfile.ps (to generate the file myfile.pdf)

Method 2

If you have bibliography file

% latex myfile (to generate myfile.dvi from myfile.tex)

% bibtex myfile (uses the .aux file to extract cited publications from the database in the .bib file, formats them according to the indicated style, and puts the results into in a .bbl file)

% dvips myfile (to generate myfile.ps from myfile.dvi)

% ps2pdf myfile.ps (to generate the file myfile.pdf)

Method 3

If you want to convert a TEX file directly to PDF and do not have a bibliography file

% pdflatex myfile

N.B. If you have images in EPS format, you need to convert it into PDF format with the following command-

% epstopdf image.eps

Method 4

If you want to convert a TEX file directly to PDF and have a bibliography file

% pdflatex myfile

% bibtex myfile

% pdflatex myfile

% pdflatex myfile

N.B. If you have images in EPS format, you need to convert it into PDF format with the following command-

% epstopdf image.eps

Subscribe to:
Posts (Atom)