aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: c789e090647cbba2a4d4cb66deba4de6d8d73d4f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Bam Query Index (qidx)
=====================

> Warning: this is a work in progress

`qidx` is tool for indexing BAM alignments by query name. While
`samtools` have the ability to sort data by query name (also called
the read name), htslib does not provide built-in utilities
to retrieve alignments by query name. This can be advantageous
for examining multi-mapped alignments.

A utility [bri](https://github.com/jts/bri) predated `qidx` and also
indexes BAM files by query name. Yet, it reads all alignments into memory
which is impractical for most human genome data.

Notes:

- `qidx` creates a disk-backed hashset using a sparse memory-mapped file. The underlying
operating system must support `mmap` and file holes
- `qidx` doesn't currently support compression. it is currently recommended to
use block-level compression (such as `zfs` `zstd` compression)
- the bamfile must be sorted by query name before the index is built `samtools sort -n`