Bam Query Index (qidx) ===================== > Warning: this is a work in progress `qidx` is tool for indexing BAM alignments by query name. While `samtools` have the ability to sort data by query name (also called the read name), htslib does not provide built-in utilities to retrieve alignments by query name. This can be advantageous for examining multi-mapped alignments. A utility [bri](https://github.com/jts/bri) predated `qidx` and also indexes BAM files by query name. Yet, it reads all alignments into memory which is impractical for most human genome data. Notes: - Currently, `qidx` is very inefficient in terms of disk space. When indexing a 33GiB BAM file (Illumina 35x), it takes up 22GiB on disk when using STD compression. It initially maps 1.2TiB into memory. This is reduced to ~120GiB due to file holes. ZSTD block compression again reduces this to 22GiB. When I get a chance, I hope to look into this further. - `qidx` creates a disk-backed hashset using a sparse memory-mapped file. The underlying operating system must support `mmap` and file holes. - `qidx` doesn't currently support compression. it is currently recommended to use block-level compression (such as `zfs` `zstd` compression.) - The bamfile must be sorted by query name before the index is built `samtools sort -n`.