Output ====== FastQ output ------------ By default, HUMID will write the deduplicated FastQ files in the current folder, using the `_dedup` suffix in the file name to distinguish them from the input FastQ files. By specifying the `-a` flag, HUMID will output the annotated FastQ files using the `_annotated` suffix in the file name. For each read in the output, the `cluster_id` will be appended to the end of the read header, using a colon (`:`) as a separator. **Special case**: The cluster with id `0` has been reserved for reads that could not be classified. For example because there were not enough bases available to create a `word`, or because the word contains one or more N bases. Statistics ---------- Run HUMID with the `-s` flag to generate deduplication statistics. These statics files can be visualized using MultiQC version 1.14 or later, or inspected directly. stats.dat ~~~~~~~~~ This is probably the most useful file, it contains the statistics about the number of reads in each of the following categories. .. list-table:: stats.dat :header-rows: 1 * - Field - Definition * - total - Total number of input reads * - usable - Total number of reads that were usable (did not contain N) * - unique - Total number of distinct input * - clusters - Total number unique reads after clustering and deduplication neigh.dat ~~~~~~~~~ This file contains a histogram of the number of reads with a given number of neighbours. The first number is the `number of neighbours` and the second number is how many distinct reads have this number of neighbours. clusters.dat ~~~~~~~~~~~~ This file contains a histogram of the number of clusters of a given size. The first number is the `cluster size`, the second number is how many clusters are of the specified size. counts.dat ~~~~~~~~~~ This file contains a histogram of the number of exact duplicates in the usable input reads. The first number is the `number of exact duplicates` and the second number is how many distinct reads have this number of exact duplicates.