Install tools for today’s lab

Last week, I mentioned that you should install some bioinformatic tools we will be using this week. If you did not have a chance to install them last week, please do it now. You should be able to install them all using conda.

sra-tools

Here, we will use sra-tools to download sequence data from publicly accessible databases such as NCBI. This tool allows you to download some sequence files from NCBI directly from your terminal without having to use your web browser. First, I will check if sra-tools is on conda.

[1]:
%%bash
conda search sra-tools
Loading channels: ...working... done
# Name                       Version           Build  Channel
sra-tools                      2.6.2               0  bioconda
sra-tools                      2.6.3               0  bioconda
sra-tools                      2.7.0               0  bioconda
sra-tools                      2.8.0               0  bioconda
sra-tools                      2.8.1               0  bioconda
sra-tools                      2.8.2               0  bioconda
sra-tools                      2.8.2      h550f44e_1  bioconda
sra-tools                      2.9.0      h470a237_1  bioconda
sra-tools                      2.9.1      h470a237_0  bioconda
sra-tools                    2.9.1_1      h470a237_0  bioconda
sra-tools                      2.9.6      h0a44026_0  bioconda
sra-tools                     2.10.0 pl526h6de7cb9_0  bioconda
sra-tools                     2.10.1 pl526h9f37e31_0  bioconda

Now, you can install sra-tools by typing

conda install sra-tools

This will install a suite of tools you will need to download publicly available sequences from NCBI.

FastQC

FastQC is a tool you will need to use to assess sequence quality and to see if they are suitable to be used in your analyses. Earlier days, sequences produced by these so-called “Next-gen” sequencing instruments (especially in early 2010s) produced sequences with high error rates and low qualities. And they sometimes include the adapter sequences that should be removed before they can be used. To check for sequence quality, there are several tools available but one of the most well-known (and most widely used) is FastQC. It exists in both graphical and command line versions. You can install it using conda. First, search for it.

[2]:
%%bash
conda search fastqc
Loading channels: ...working... done
# Name                       Version           Build  Channel
fastqc                        0.10.1               0  bioconda
fastqc                        0.10.1               1  bioconda
fastqc                        0.11.2               1  bioconda
fastqc                        0.11.2      pl5.22.0_0  bioconda
fastqc                        0.11.3               0  bioconda
fastqc                        0.11.3               1  bioconda
fastqc                        0.11.4               1  bioconda
fastqc                        0.11.4               2  bioconda
fastqc                        0.11.5               1  bioconda
fastqc                        0.11.5               4  bioconda
fastqc                        0.11.5      pl5.22.0_2  bioconda
fastqc                        0.11.5      pl5.22.0_3  bioconda
fastqc                        0.11.6               2  bioconda
fastqc                        0.11.6      pl5.22.0_0  bioconda
fastqc                        0.11.6      pl5.22.0_1  bioconda
fastqc                        0.11.7               4  bioconda
fastqc                        0.11.7               5  bioconda
fastqc                        0.11.7               6  bioconda
fastqc                        0.11.7      pl5.22.0_0  bioconda
fastqc                        0.11.7      pl5.22.0_2  bioconda
fastqc                        0.11.8               0  bioconda
fastqc                        0.11.8               1  bioconda
fastqc                        0.11.8               2  bioconda
fastqc                        0.11.9               0  bioconda

As you can see, there are several versions and if you just type conda install fastqc, it will install the latest version of it.

BBmap

FastQC will help you see what needs to be done to the original raw sequnce files but it doesn’t actually do anything else. To actually trim reads of poor quality or to discard sequences that do not meet specific requirement, you need to use tools like BBmap. It is just one of the many tools that does similar function.

[3]:
%%bash
conda search bbmap
Loading channels: ...working... done
# Name                       Version           Build  Channel
bbmap                          35.85               1  bioconda
bbmap                          35.85               2  bioconda
bbmap                          37.02               0  bioconda
bbmap                          37.10               0  bioconda
bbmap                          37.10               1  bioconda
bbmap                          37.17               0  bioconda
bbmap                          37.17               1  bioconda
bbmap                          37.52               0  bioconda
bbmap                          37.52               1  bioconda
bbmap                          37.62               0  bioconda
bbmap                          37.62               1  bioconda
bbmap                          37.66               0  bioconda
bbmap                          37.75               0  bioconda
bbmap                          37.77               0  bioconda
bbmap                          37.78               0  bioconda
bbmap                          37.90               0  bioconda
bbmap                          37.95               0  bioconda
bbmap                          37.96               0  bioconda
bbmap                          37.99               0  bioconda
bbmap                          37.99               1  bioconda
bbmap                          38.06               0  bioconda
bbmap                          38.06               2  bioconda
bbmap                          38.16               0  bioconda
bbmap                          38.18               0  bioconda
bbmap                          38.19      h470a237_0  bioconda
bbmap                          38.20      h470a237_0  bioconda
bbmap                          38.22      h1de35cc_1  bioconda
bbmap                          38.22      h470a237_0  bioconda
bbmap                          38.44      h1de35cc_0  bioconda
bbmap                          38.45      h1de35cc_0  bioconda
bbmap                          38.46      h1de35cc_0  bioconda
bbmap                          38.49      h1de35cc_0  bioconda
bbmap                          38.51      h1de35cc_0  bioconda
bbmap                          38.56      h1de35cc_0  bioconda
bbmap                          38.57      h1de35cc_0  bioconda
bbmap                          38.58      h01d97ff_0  bioconda
bbmap                         38.61b      h01d97ff_0  bioconda
bbmap                          38.62      h01d97ff_0  bioconda
bbmap                          38.63      h01d97ff_0  bioconda
bbmap                          38.65      h01d97ff_0  bioconda
bbmap                          38.67      h01d97ff_0  bioconda
bbmap                          38.68      h01d97ff_0  bioconda
bbmap                          38.69      h01d97ff_0  bioconda
bbmap                          38.70      h01d97ff_0  bioconda
bbmap                          38.71      h01d97ff_0  bioconda
bbmap                          38.72      h01d97ff_0  bioconda
bbmap                          38.73      h01d97ff_0  bioconda
bbmap                          38.75      h01d97ff_0  bioconda
bbmap                          38.76      h01d97ff_0  bioconda
bbmap                          38.79      h01d97ff_0  bioconda
bbmap                          38.84      h0b31af3_0  bioconda
bbmap                          38.84      hf29c6f4_1  bioconda
bbmap                          38.86      hf29c6f4_0  bioconda

This is a collection of tools that you will need to use to remove poor-quality sequence reads and to trim bad or contaminated sequences from the pool. Install it by typing:

conda install bbmap

Now you have all the tools we will need to use for today’s exercise.