Initial commit

author: flu0r1ne <flu0r1ne@flu0r1ne.net> 2022-10-30 19:30:29 -0500
committer: flu0r1ne <flu0r1ne@flu0r1ne.net> 2022-10-30 19:30:29 -0500
commit: 20e52f326cdf1b6c2ca9b2c0b5be07637d9196d2 (patch)
tree: 1d957cfd5ca8b9ccd0a91fa5d5415599edd241d1 /README.md
download: qidx-20e52f326cdf1b6c2ca9b2c0b5be07637d9196d2.tar.xz
qidx-20e52f326cdf1b6c2ca9b2c0b5be07637d9196d2.zip
1 files changed, 22 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..ff14ab3
--- /dev/null
+++ b/README.md
@@ -0,0 +1,22 @@
+Bam Query Index (qidx)
+=====================
+
+> Warning: this is a work in progress
+
+`qidx` is tool for indexing BAM alignments by query name. While the
+samtools have the ability to sort data by query name (also called
+the read name), there htslib does not provide built-in utilities
+to retrieve alignments by query name. This can be advantageous
+for examining multi-mapped alignments.
+
+While a utility [bri](https://github.com/jts/bri) predated `qidx`
+providing the same utilities, it reads all alignments into memory
+which is impractical for most human genome data.
+
+Notes:
+
+- `qidx` creates a disk-backed using a sparse memory-mapped file. The underlying
+operating system must support `mmap` and file holes
+- `qidx` doesn't currently support compression. it is currently recommended to
+use block-level compression (such as `zfs` `zstd` compression)
+- the bamfile must be sorted by query name before the index is built `samtools sort -n`
author	flu0r1ne <flu0r1ne@flu0r1ne.net>	2022-10-30 19:30:29 -0500
committer	flu0r1ne <flu0r1ne@flu0r1ne.net>	2022-10-30 19:30:29 -0500
commit	20e52f326cdf1b6c2ca9b2c0b5be07637d9196d2 (patch)
tree	1d957cfd5ca8b9ccd0a91fa5d5415599edd241d1 /README.md
download	qidx-20e52f326cdf1b6c2ca9b2c0b5be07637d9196d2.tar.xz qidx-20e52f326cdf1b6c2ca9b2c0b5be07637d9196d2.zip