From 20e52f326cdf1b6c2ca9b2c0b5be07637d9196d2 Mon Sep 17 00:00:00 2001 From: flu0r1ne Date: Sun, 30 Oct 2022 19:30:29 -0500 Subject: Initial commit --- README.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..ff14ab3 --- /dev/null +++ b/README.md @@ -0,0 +1,22 @@ +Bam Query Index (qidx) +===================== + +> Warning: this is a work in progress + +`qidx` is tool for indexing BAM alignments by query name. While the +samtools have the ability to sort data by query name (also called +the read name), there htslib does not provide built-in utilities +to retrieve alignments by query name. This can be advantageous +for examining multi-mapped alignments. + +While a utility [bri](https://github.com/jts/bri) predated `qidx` +providing the same utilities, it reads all alignments into memory +which is impractical for most human genome data. + +Notes: + +- `qidx` creates a disk-backed using a sparse memory-mapped file. The underlying +operating system must support `mmap` and file holes +- `qidx` doesn't currently support compression. it is currently recommended to +use block-level compression (such as `zfs` `zstd` compression) +- the bamfile must be sorted by query name before the index is built `samtools sort -n` -- cgit v1.2.3