Gk-arrays are provided as a simple-to-use C++ library dedicated to queries on large collection of sequences as produced by high-throughput sequencers (e.g. HiSeq 2000 from Illumina, 454 from Roche).
Gk-arrays index k-mers of reads and allow to answer different queries on that read collection (e.g. how many reads share this k-mer? where does this k-mer occur in the read collection?).
Gk-arrays consist of a space-efficient alternative to hash tables while being similar in terms of query times.
Gk-arrays is a work by Nicolas Philippe, Mikaël Salson, Thierry Lecroq, Martine Léonard, Thérèse Commes and Éric Rivals. It has been published in the BMC Bioinformatics journal. If you use this work, please don't forget to cite this paper.
Gk-arrays source code is distributed under the GPL-compliant CeCILL-C license.
A very simple test file can be downloaded from here.
Once the library is installed, you can compile the test file using e.g.
g++ -Wall -pedantic -O3 testGkArrays.cpp -o testGkArrays -lGkArrays.
Another test file (measuring the query time) is also included in the source
code under the
The installation will create you a test
buildTables) and a library
that could be used in any of your programs.
Note: the library usage is simplified since version 1.0.0. If necessary, you can see the details for using previous versions.
make installas an administrator
ldconfigas an administrator.
You can specify parameters to the configure script.
For instance you can choose to build a static version (quicker) of the library
rather than a shared version. Typing
./configure --help will
provide you the list of available options.
You just need to install the package using a dedicated program on your
distribution or by typing
dpkg -i package-name.
Inside the archive, you will find under the
a documentation on how to use the Gk-arrays in your code with a simple
A full documentation of the library is available online or as a downloable PDF.
email@example.com) if you find a bug in the library or if you encounter any problem.