Variant filtering

For our QC pipeline, we first read in the .vcf file, split multiallelics, and realign indels. A series of careful initial QC steps are applied before a filtered .vcf and matrix table is passed to this QC pipeline.

Filter Variants AFR AMR EAS EUR SAS
Raw count 62,834,605 NA NA NA NA NA
Following MAD 32,537,048 NA NA NA NA NA
Invariant sites after sample filters NA 16,364,165 18,336,374 18,001,198 3,216,967 16,671,965
Overall variant call rate NA 281,328 78,151 107,126 1,485,220 232,199
Variants after filters NA 2,345,760 576,716 883,080 14,284,267 2,087,425
Variants after initial filter NA 18,992,636 18,992,636 18,992,636 18,992,636 18,992,636
Variants failing HWE filter NA 6,614 1,100 2,391 23,574 4,939


Sample filtering


Filter Samples AFR AMR EAS EUR SAS %
Initial samples in raw UKBB vcf 500,000 NA NA NA NA NA 100.0
Samples after initial filter 418,045 NA NA NA NA NA 83.6
Sample call rate 5,537 NA NA NA NA NA 1.1
Mean DP 0 NA NA NA NA NA 0.0
Mean GQ 0 NA NA NA NA NA 0.0
Samples with sex swap 269 12 7 1 219 30 0.1
Samples after population filters 411,824 6,601 489 1,650 396,682 6,402 82.4
Within batch Ti/Tv ratio outside 5.9304 median absolute deviations 0 0 0 0 0 0 0.0
Within batch Het/HomVar ratio outside 5.9304 median absolute deviations 0 0 0 0 0 0 0.0
Within batch Insertion/Deletion ratio outside 5.9304 median absolute deviations 0 0 0 0 0 0 0.0
n singletons > 20 median absolute deviations 277 0 0 0 277 0 0.1
Samples after final sample filters 411,547 6,601 489 1,650 396,405 6,402 82.3