A brief introduction to pandoc

pandoc is a really handy tool for a lot of bioinformaticians due to its ability to inter-convert between different document formats. For example, if you want to convert a simple text file into a Word document or a PDF file, you can convert it within a command line environment. Its usefulness will become highly evident in work settings where you need to generate documents or reports on the fly as a part of a bioinformatic workflow or a pipeline, for example.

In your assignment 1, I instructed you to generate a PDF document from a Markdown text file. First, you need to install pandoc through conda.

[1]:
%%bash
conda search pandoc
Loading channels: ...working... done
# Name                       Version           Build  Channel
pandoc                      1.16.0.2               0  conda-forge
pandoc                      1.17.0.1               0  conda-forge
pandoc                      1.17.0.2               0  conda-forge
pandoc                        1.17.1               0  conda-forge
pandoc                        1.17.2               0  conda-forge
pandoc                          1.18               0  conda-forge
pandoc                          1.19               0  conda-forge
pandoc                        1.19.1               0  conda-forge
pandoc                        1.19.2               0  conda-forge
pandoc                      1.19.2.1      ha5e8f32_1  pkgs/main
pandoc                       2.0.0.1               0  conda-forge
pandoc                       2.0.0.1               1  conda-forge
pandoc                         2.0.3               0  conda-forge
pandoc                         2.0.4               0  conda-forge
pandoc                         2.0.5               0  conda-forge
pandoc                           2.1               0  conda-forge
pandoc                         2.1.1               0  conda-forge
pandoc                         2.1.2               0  conda-forge
pandoc                         2.1.3               0  conda-forge
pandoc                           2.2      hde52d81_0  conda-forge
pandoc                         2.2.1      h1a437c5_0  pkgs/main
pandoc                         2.2.1      hde52d81_0  conda-forge
pandoc                         2.2.2      hde52d81_0  conda-forge
pandoc                         2.2.2      hde52d81_1  conda-forge
pandoc                       2.2.3.2               0  pkgs/main
pandoc                           2.3               0  conda-forge
pandoc                         2.3.1               0  conda-forge
pandoc                           2.4               0  conda-forge
pandoc                           2.5               0  conda-forge
pandoc                           2.5               1  conda-forge
pandoc                           2.6               0  conda-forge
pandoc                           2.6               1  conda-forge
pandoc                         2.7.1               0  conda-forge
pandoc                         2.7.2               0  conda-forge
pandoc                         2.7.3               0  conda-forge
pandoc                           2.8               0  conda-forge
pandoc                       2.8.0.1               0  conda-forge
pandoc                         2.8.1               0  conda-forge
pandoc                           2.9               0  conda-forge
pandoc                         2.9.1               0  conda-forge
pandoc                       2.9.1.1               0  conda-forge
pandoc                         2.9.2               0  conda-forge
pandoc                       2.9.2.1               0  conda-forge
pandoc                       2.9.2.1               0  pkgs/main
pandoc                          2.10               0  conda-forge
pandoc                          2.10               0  pkgs/main
pandoc                          2.10      h1de35cc_0  conda-forge
pandoc                        2.10.1               0  pkgs/main
pandoc                        2.10.1      haf1e3a3_0  conda-forge

To install, pandoc, just type conda install pandoc and it will install the latest version of it.

In order to convert this Markdown text file into a PDF file, I gave as an example the following commands you need to type in your terminal:

## For Mac
pandoc assignment_01.md \
        -V geometry:margin=1in \
        -V fontsize:11pt \
        --variable mainfont="PT Serif" \
        --variable sansfont="Arial" \
        --variable monofont="Menlo" \
        --pdf-engine=xelatex  \
        -o assignment_01.pdf

## For Windows/WSL
pandoc assignment_01.md \
        -V geometry:margin=1in \
        -V fontsize:11pt \
        --variable mainfont="Liberation Serif" \
        --variable sansfont="Liberation Sans" \
        --variable monofont="Liberation Mono" \
        --pdf-engine=xelatex  \
        -o assignment_01.pdf

You might find this way of typing commands in your terminal a bit confusing. But don’t be afraid! All of the parameters broken into several lines can be written in a single line. But it will usually look very long and it may be a bit harder to fit it all in your terminal window. Therefore, I type “\” (backslash) to tell the shell (your Unix environment) that the command is not over yet and continues in the next line. The backslash is a special character in the Unix environment and is the mirror of “/” (Slash). Programmers use it as an “escape” character to specify something. In this example, it allows you to escape the end of line and continues the next line.

If you type “\n”, this means you are specifying a line break. For example, this command: echo -e "My\nName" in your terminal, it will result in this:

My
Name

If you type “\t” in the awk command, it means you are referring to a “tab” character (the character you see above the “cap lock” on your keyboard.

Coming back to the pandoc command examples, I specified font variables such as --variable sansfont="Arial" to indicate that pandoc uses this specific font for Sans Serif font in the document generated. This is somewhat confusing to many and it may not work, depending on whether or not these fonts are installed on your computer. You can omit the lines indicating these fonts and pandoc will automatically use whatever fonts that are installed on your system.

However, one thing you might notice here is that this parameter --pdf-engine=xelatex may not work on your computer depending on whether LaTeX is installed on your computer or not. If it is not installed, you will need to install the required packages through conda.

[2]:
%%bash
conda search texlive-core
Loading channels: ...working... done
# Name                       Version           Build  Channel
texlive-core                20160520    pl5.20.3.1_1  conda-forge
texlive-core               20160523b    pl5.20.3.1_0  conda-forge
texlive-core               20160523b    pl5.20.3.1_1  conda-forge
texlive-core               20160523b      pl5.20.3_3  conda-forge
texlive-core                20170520    pl5.22.2.1_0  conda-forge
texlive-core                20170520    pl5.22.2.1_1  conda-forge
texlive-core                20170520    pl5.22.2.1_2  conda-forge
texlive-core                20170520 pl526h2f74ec9_2  pkgs/main
texlive-core                20170520 pl526h47ed19a_1  pkgs/main
texlive-core                20170520 pl526ha3510ec_1  pkgs/main
texlive-core                20170520 pl526hc2f8f47_1  pkgs/main
texlive-core                20180414      ha09c46f_0  pkgs/main
texlive-core                20180414 pl526h0778769_1  conda-forge
texlive-core                20180414 pl526h6632d02_1  conda-forge
texlive-core                20180414 pl526hd51217d_2  conda-forge
texlive-core                20180414 pl526hd51217d_3  conda-forge
texlive-core                20180414 pl526hfbb4d6c_0  conda-forge

This package should be present on both Ubuntu and Mac conda environments. Go ahead and install it by typing conda install texlive-core. And try typing the pandoc commands again. Try this first:

pandoc assignment_01.md \
        -V geometry:margin=1in \
        -V fontsize:11pt \
        --pdf-engine=xelatex  \
        -o assignment_01.pdf

If this fails to produce a PDF file or you get error messages, try typing this:

pandoc assignment_01.md \
        -V geometry:margin=1in \
        -V fontsize:11pt \
        --pdf-engine=pdflatex \
        -o assignment_01.pdf

Hopefully, you will get a PDF file after this command. Now, I will show you another example to generate a Word document using pandoc. Type

pandoc assignment_01.md \
        -V geometry:margin=1in \
        -V fontsize:11pt \
        -o assignment_01.docx

Now, you have converted your assignment Markdown file into a Word document that can be opened with Microsoft Word. There are just two examples of what you can do with pandoc. You can go here to see what else you can do with it.

https://pandoc.org/

The possibilities are enormous. You can even convert your Jupyter notebook into other formats.

Alternative ways to convert Markdown files to PDF

If you ran into problem installing LaTeX and related tools that gives you problem with pandoc, you can still use a different tool to generate a PDF file. See here: https://superuser.com/questions/689056/how-can-i-convert-github-flavored-markdown-to-a-pdf

Basically, you install a tool known as grip, which will render your Markdown file on your Web browser, then you print and save it as a PDF file (Chrome works best for this). To install grip, you type:

pip install grip

Then on your terminal, type:

grip assignment_01.md

This will print a URL on your terminal. In my case, it’s http://localhost:6419/

Copy and paste this address to your Chrome browser and you will see the rendering. Next, print it (save it as a PDF) and you get a PDF file. I will accept it as an alternative way to generate a PDF file for your assignments. To stop the grip tool, type Control (CTRL key) + C together.