How to Extract BiBTeX info from MathScinet by using CURL

How to extract BiBTeX info from MathScinet by using CURL

In the web mrlookup, that is a database where almost any scientific paper related to mathematics can be found, it is possible to get the reference of desired paper and its BiBTeX file.

The search can be carried out by entering the following fields:

Title \(\rightarrow\) \(\tt ti\)

Title \(\rightarrow\) \(\tt ti\)

Journal \(\rightarrow\) \(\tt jrnl\)

First page \(\rightarrow\) \(\tt ipage\)

Last page \(\rightarrow\) \(\tt fpage\)

year \(\rightarrow\) \(\tt year\)

Curl is a tool for transferring data from or to a server then, since we are interested into extracting data, it is a perfect tool to do it.

For example, in order to retrieve the file of Mr. M. Philips of the year 2010 which titles contain the word "polynomial" and the article starts at page 42 and save it in a file, namely bblfile.txt, one needs to run in the shell the command

curl -d "year=2010&au=phillips, M.&ti=polynomial&ipage=42" > bblfile.txt

Part of the requested HTML file looks like:
Retrieved all documents ' '

@article {MR2765898,
    AUTHOR = {Iglesias, Emma M. and Phillips, Garry D. A.},
     TITLE = {The bias to order {$T^{-2}$} for the general {$k$}-class
              estimator in a simultaneous equation model},
   JOURNAL = {Econom. Lett.},
  FJOURNAL = {Economics Letters},
    VOLUME = {109},
      YEAR = {2010},
    NUMBER = {1},
     PAGES = {42--45},
      ISSN = {0165-1765},
   MRCLASS = {62F10},
  MRNUMBER = {2765898},
       DOI = {10.1016/j.econlet.2010.07.011},
       URL = {},

We only want the part of the code between the HTML commands: <pre> and </pre>

To extract the relevant part from the file and insert it into the new reference.bbl we need to run the AWK command

cat bblfile.txt | tr -d \\n | tr -s ' ' | AWK -v FS="(<pre>|</pre>)" '{print $2}' > reference.bbl

The command tr -d \\\n removes the newlines symbols.
The command tr -s ' ' delete the extra space.

Observation: Sometimes the process fails and you get no reference, in such a case use Ti= instead of ti=. This process stores the first record retrieved by the system.