Getting started

Let's suppose you already downloaded CRAC (if not, it is not too late!). We now explain how to install it and how to launch it (without entering details).

Installing CRAC

From the deb package

You just need to install the package using a dedicated program on your distribution or by typing dpkg -i package-name, where package-name must be replaced by the name of your package. This will install crac as well as crac-index (to create indexes working with CRAC) on your system.

From the source code

  1. Unpack the archive
  2. Enter the directory crac-version-number
  3. Type ./configure
  4. If everything went fine, run make
  5. You may want to check everything is ok by running make check
  6. Finally, you can install the software on your system by running make install

If the configure step failed, that may be due to a missing library. In particular the zlib is needed. On a Debian or Debian-like system, you'll need to install zlib1g, zlib1g-dev.

On a Galaxy instance

Using CRAC

CRAC relies on a pre-computed index of the genome, as Bowtie or BWA do. The first step is therefore to build such an index, if it is not done yet. Another solution is to download one of the precomputed indexes.

Indexing

If you still need to index your genome, we explain it in the following:

The crac-index program is the one that creates an index for a genome. For creating such an index, you must launch a command as this one:

crac-index index myIndex sequence1.fa sequence2.fa sequence3.fa

The first parameter (index) specifies that we want to create an index. The second parameter (myIndex) is the name of the index to be created. The following parameters are FASTA or multi-FASTA files containing the sequences to be indexed.

The creation of the index always generates two files: a .ssa file and a .conf file. Please, note that the extensions must not be provided neither to crac-index nor to crac.

If needed, the original sequences can be recovered using the CRAC index. Therefore you can delete the original FASTA files to save space. This recovery can be done using the following command:

crac-index get sequences.fa myIndex

This will output all the sequences indexed in myIndex.ssa in the file sequences.fa.

Launching CRAC

Once an index has been built, CRAC can be used with that index. CRAC must always be launched with at least three parameters.

  • -i the name of the index (e.g. myIndex, recall that the extension must not be provided!)
  • -r the name of the FASTA or FASTQ file containing the reads (the input file may also be compressed using gzip)
  • -k the length of the k-mer to be used, we recommend to set k to 22 for the human genome for a better accuracy.

The value of k is very important for the algorithm. You must not underestimate it, otherwise the results will be of no utility. It must be chosen to ensure (as much as possible) that a k-mer has a very high probability to occur a single times on the genome.

If the read length is fixed, we deeply recommend you to specify the read length, by using the -m parameter. CRAC will therefore be much faster.

You may also want some output to be created to know what was mapped, what was not, and where. CRAC can produce a SAM file by specifying the name of the SAM output file using the --sam parameter.

As an example, CRAC can be launched with those parameters:

crac -i myIndex -k 22 -r reads.fastq -m 200 --sam output.sam --nb-threads 10

In that example CRAC is launched on the genome indexed in myIndex, with 22-mers on the reads stored in reads.fastq. All reads are 200bp-long (or are truncated if longer, or ignored if shorter). The output is written in the output.sam file and the program is launched in parallel on 10 threads.