How to Clean and organize your BBL file

How to clean and organize your BBL file

If one wants to prepare a database with all his BBL entries a useful way is to clean, organize and compress the elements. in order to do it we need to follow some basic steps:

  1. We must convert the file to unimode encoding (UFT-8).
  2. Some symbols must be deleted such us double bracets, extra spaces, etc.
  3. Be sure every BBL element is "complete" according to its BIBTEX type.

If we download the reference from an official place such us mrlookup, or Zentralblatt MATH, among other sites the UTF-8 is almost guaranteed, but in some other sites or even if we construct our own BBL reference, sometimes we re not careful formatting it.

We are going to present a Python program to have each BBL reference in one single line, in the cleanest and compressed way possible so that from this file we can generate a nice database we can manage all our data in a simple way.

We start importing regular expressions library as well as opening the BBL file we are going to read and we create the file we are going to insert the info from such original BBL file.

\(\verb+import re+\)
\(\verb+with open ('references.bib','r') as file:+\)
\(\verb+with open ('newreferences.bib', 'w') as writer:+\)

once we we have the files ready to work with them, we set some initial values

entities = 0 # global elements
lincom = 0 # commented lines
fiam = 0 # ammount of fields in the current item
startentitie = False # to control if we already created a new entity or not

The next step is going line by line in the references.bbl file and clean the extra spaces, and to check if it is a comment line.

for line in file:
line=line.strip() # delete unnecessary space
line=line.replace(' = ','=')
if'^[\%]{1,}$',line)!=None: # lines that are comments
lincom +=1

Observation: It is useful know how to use properly regular expressions such us '^[\%]{1,}$'. In this case we are marking the lines starting with %

The next step is to know which type of publication contains the bbl file, i.e. it is article, book, or other.

if'^\@[ a-zA-Z]{1,}\{[ a-zA-Z]{1,}[0-9]{0,}\,$',line)!=None:
typeent=line.upper()[1:int(line.find("{"))].replace(' ','')
nameent=line.upper()[int(line.find("{"))+1:-1].replace(' ','') #we get the name of the entity
entities +=1
fiam=0 #we reset the amount of the field of the item
startentitie=True #we start a new entity

Now, we extract part of the field of the entity that are well formated. Remember that sometimes the field appears in different lines, we need to join this strings in the same field. With the following code we produce that.

if'[ a-zA-Z]{1,}\=[ a-zA-Z0-9]{1,},$',line)!=None: #lines of information with brackets and ending with a comma
datatoinsert = line[:-1]
dataname = datatoinsert.upper()[:int(datatoinsert.find("="))]
datastr = datatoinsert[int(datatoinsert.find("=")):]
datatoinsert = dataname + datastr + ','
stri=stri + datatoinsert
fiam +=1 #we add the field of the entity
if'[ a-zA-Z]{1,}\=\{[ .,@\a-zA-Z0-9]{1,}\},$',line)!=None:
datatoinsert = line
dataname = datatoinsert.upper()[:int(datatoinsert.find("="))+1]
datastr = datatoinsert[int(datatoinsert.find("="))+2:-2]
datatoinsert = dataname + datastr + ','
stri = stri + datatoinsert
fiam +=1 #We add the field to the item

Finally, to complete the process of the current entity and save this into the new cleaned bbl file.

if'\}$',line)!=None and startentitie:
stri=stri + '}' + '%'+ str(entities) + '---' + str(fiam) + '-\n'
writer.write(stri) ##save the info into the output file
startentitie = False #we set the parameter we are ready to seek a new entity

And we present an inform with the values parsed with this file in python

print(f'Entities: {entities} and commented lines: {lincom}')

Information: This process can fail if you use a bbl with entities which are not well formatted.


Link Size Description
3 KB Python File