diff options
author | flu0r1ne <flu0r1ne@flu0r1ne.net> | 2022-10-31 17:32:08 -0500 |
---|---|---|
committer | flu0r1ne <flu0r1ne@flu0r1ne.net> | 2022-10-31 17:32:08 -0500 |
commit | 6b28db8eb2d8a2761f8bde0dcfb9d5ceefe5827b (patch) | |
tree | 8a8e4adf1a9427498f09d717ed94adea3ae0c661 | |
parent | 4cff48542cf04c92106f09d5ecf83bc6ff1d8354 (diff) | |
download | qidx-main.tar.xz qidx-main.zip |
-rw-r--r-- | README.md | 10 |
1 files changed, 7 insertions, 3 deletions
@@ -15,8 +15,12 @@ which is impractical for most human genome data. Notes: +- Currently, `qidx` is very inefficient in terms of disk space. When indexing a 33GiB BAM file + (Illumina 35x), it takes up 22GiB on disk when using STD compression. It initially maps 1.2TiB + into memory. This is reduced to ~120GiB due to file holes. ZSTD block compression again reduces + this to 22GiB. When I get a chance, I hope to look into this further. - `qidx` creates a disk-backed hashset using a sparse memory-mapped file. The underlying -operating system must support `mmap` and file holes +operating system must support `mmap` and file holes. - `qidx` doesn't currently support compression. it is currently recommended to -use block-level compression (such as `zfs` `zstd` compression) -- the bamfile must be sorted by query name before the index is built `samtools sort -n` +use block-level compression (such as `zfs` `zstd` compression.) +- The bamfile must be sorted by query name before the index is built `samtools sort -n`. |