Update readmeHEAD main

author: flu0r1ne <flu0r1ne@flu0r1ne.net> 2022-10-31 17:32:08 -0500
committer: flu0r1ne <flu0r1ne@flu0r1ne.net> 2022-10-31 17:32:08 -0500
commit: 6b28db8eb2d8a2761f8bde0dcfb9d5ceefe5827b (patch)
tree: 8a8e4adf1a9427498f09d717ed94adea3ae0c661
parent: 4cff48542cf04c92106f09d5ecf83bc6ff1d8354 (diff)
download: qidx-main.tar.xz
qidx-main.zip
1 files changed, 7 insertions, 3 deletions
diff --git a/README.md b/README.md
index c789e09..7d9b441 100644
--- a/README.md
+++ b/README.md
@@ -15,8 +15,12 @@ which is impractical for most human genome data.
 
 Notes:
 
+- Currently, `qidx` is very inefficient in terms of disk space. When indexing a 33GiB BAM file
+ (Illumina 35x), it takes up 22GiB on disk when using STD compression. It initially maps 1.2TiB
+ into memory.  This is reduced to ~120GiB due to file holes. ZSTD block compression again reduces
+ this to 22GiB.  When I get a chance, I hope to look into this further.
 - `qidx` creates a disk-backed hashset using a sparse memory-mapped file. The underlying
-operating system must support `mmap` and file holes
+operating system must support `mmap` and file holes.
 - `qidx` doesn't currently support compression. it is currently recommended to
-use block-level compression (such as `zfs` `zstd` compression)
-- the bamfile must be sorted by query name before the index is built `samtools sort -n`
+use block-level compression (such as `zfs` `zstd` compression.)
+- The bamfile must be sorted by query name before the index is built `samtools sort -n`.
author	flu0r1ne <flu0r1ne@flu0r1ne.net>	2022-10-31 17:32:08 -0500
committer	flu0r1ne <flu0r1ne@flu0r1ne.net>	2022-10-31 17:32:08 -0500
commit	6b28db8eb2d8a2761f8bde0dcfb9d5ceefe5827b (patch)
tree	8a8e4adf1a9427498f09d717ed94adea3ae0c661
parent	4cff48542cf04c92106f09d5ecf83bc6ff1d8354 (diff)
download	qidx-main.tar.xz qidx-main.zip