Output

FastQ output

By default, HUMID will write the deduplicated FastQ files in the current folder, using the _dedup suffix in the file name to distinguish them from the input FastQ files.

By specifying the -a flag, HUMID will output the annotated FastQ files using the _annotated suffix in the file name. For each read in the output, the cluster_id will be appended to the end of the read header, using a colon (:) as a separator.

Special case: The cluster with id 0 has been reserved for reads that could not be classified. For example because there were not enough bases available to create a word, or because the word contains one or more N bases.

Statistics

Run HUMID with the -s flag to generate deduplication statistics. These statics files can be visualized using MultiQC version 1.14 or later, or inspected directly.

stats.dat

This is probably the most useful file, it contains the statistics about the number of reads in each of the following categories.

stats.dat
Field	Definition
total	Total number of input reads
usable	Total number of reads that were usable (did not contain N)
unique	Total number of distinct input
clusters	Total number unique reads after clustering and deduplication

neigh.dat

This file contains a histogram of the number of reads with a given number of neighbours. The first number is the number of neighbours and the second number is how many distinct reads have this number of neighbours.

clusters.dat

This file contains a histogram of the number of clusters of a given size. The first number is the cluster size, the second number is how many clusters are of the specified size.

counts.dat

This file contains a histogram of the number of exact duplicates in the usable input reads. The first number is the number of exact duplicates and the second number is how many distinct reads have this number of exact duplicates.