Output

FastQ output

By default, HUMID will write the deduplicated FastQ files in the current folder, using the _dedup suffix in the file name to distinguish them from the input FastQ files.

By specifying the -a flag, HUMID will output the annotated FastQ files using the _annotated suffix in the file name. For each read in the output, the cluster_id will be appended to the end of the read header, using a colon (:) as a separator.

Special case: The cluster with id 0 has been reserved for reads that could not be classified. For example because there were not enough bases available to create a word, or because the word contains one or more N bases.

Statistics

Run HUMID with the -s flag to generate deduplication statistics. These statics files can be visualized using MultiQC version 1.14 or later, or inspected directly.

stats.dat

This is probably the most useful file, it contains the statistics about the number of reads in each of the following categories.

stats.dat

Field

Definition

total

Total number of input reads

usable

Total number of reads that were usable (did not contain N)

unique

Total number of distinct input

clusters

Total number unique reads after clustering and deduplication

neigh.dat

This file contains a histogram of the number of reads with a given number of neighbours. The first number is the number of neighbours and the second number is how many distinct reads have this number of neighbours.

clusters.dat

This file contains a histogram of the number of clusters of a given size. The first number is the cluster size, the second number is how many clusters are of the specified size.

counts.dat

This file contains a histogram of the number of exact duplicates in the usable input reads. The first number is the number of exact duplicates and the second number is how many distinct reads have this number of exact duplicates.