3 minutes
Manipulation of Latex-bib using regular expressions in VIM
Occasionally, although the possibility of submitting LaTeX source files is great, it results convenient to clean these files before.
Particularly, here is an example of how to edit bibliography-related fields within the *.bib
file using vim.
Matches are made with regular expressions (aka regex). Hence, it is convenient to check for consistency along the fields in the document (i.e. What kind of format is used? What kind of separators?).
Please don”t forget to use :help
to find out more about matches and patterns in vim (:help regex
, :h search
, etc).
Preliminaries
A *.bib
file contains the following structure.
@article{keyentry,
author = {author},
abstract = {some text},
annote = {some text},
url = {someurlinwww},
title = {some text},
journal = {some text},
year = {some year},
}
Alternative structures may suppress brackets or use quotation marks instead. For example,
@article{keyentry,
author = "author",
abstract = "some text",
annote = "some text",
url = "someurlinwww",
title = "some text",
journal = "some text",
year = some year,
}
The search + replace methods implemented in vim follow the structure: :range s[ubstitute]/pattern/string/cgiI
where c confirms each substitution; g replaces all occurrences in the line (without g - only first); i ignores case for the pattern; and I doesn”t ignore case for the pattern.
.
Example
Here is the bibliography file used for the example - Sample.bib
In normal mode, /
is used as a preliminary search to check for the pattern recognition. For instance, the following command searches for fields containing url.
/ url\s*=
Here \s
means “any space” and *
means “any number of occurrences”.
To display all occurrences in the file, the following command can be used:
:g/ url\s*= {.*\}
where .*\
means “any number of characters excluding the new line”. Therefore, if the bib entry is expected to be a single line, this expression is sufficient; however for an entry with multiple lines, such as the abstract section, the command may require an additional method. The following command includes \{-}
which stands for “null or more matches of the preceding atom”. As a result, the search is made iterative and everything contained between the brackets is thus found.
:g/ url\s*= {\_.\{-}},
The final search + replace command results as following:
:%s/ url\s*= {\_.\{-}},/url = {},/g
Using a similar search + replace command is then used to replace other fields; however, if more than one field must be changed, then a set of vim instructions may be called from the terminal, using vim -c
. The following command (i) replaces information contained in 4 fields, (ii) saves the edited document in a new one, and (iii) exits without saving in the original file.
vim -c "%s/ url\s*= {\_.\{-}}/url = {}/g" \
-c "%s/ annote\s*= {\_.\{-}}/ annote = {}/g" \
-c "%s/abstract\s*= {\_.\{-}}/ abstract = {}/g" \
-c "%s/ file\s*= {\_.\{-}}/ file = {}/g" \
-c "wq temp.bib" \
-c "q\!" \
sample.bib
.
Of course, as is common practice in vim, such a command might be replaced by a more sophisticated regex method which reduces the number of instructions and employs different patterns. By the way, a maximum of 10 instructions is supported in vim -c
mode.
To end, cleaning a *.bib
file in/with vim can be performed in a single command. Pretty cool!
I hope this has been useful.