Variant filtering
For our QC pipeline, we first read in the .vcf file, split
multiallelics, and realign indels. A series of careful initial QC steps
are applied before a filtered .vcf and matrix table is passed to this QC
pipeline.
Filter
|
Variants
|
AFR
|
AMR
|
EAS
|
EUR
|
SAS
|
Raw count
|
62,834,605
|
NA
|
NA
|
NA
|
NA
|
NA
|
Following MAD
|
32,537,048
|
NA
|
NA
|
NA
|
NA
|
NA
|
Invariant sites after sample filters
|
NA
|
16,364,165
|
18,336,374
|
18,001,198
|
3,216,967
|
16,671,965
|
Overall variant call rate
|
NA
|
281,328
|
78,151
|
107,126
|
1,485,220
|
232,199
|
Variants after filters
|
NA
|
2,345,760
|
576,716
|
883,080
|
14,284,267
|
2,087,425
|
Variants after initial filter
|
NA
|
18,992,636
|
18,992,636
|
18,992,636
|
18,992,636
|
18,992,636
|
Variants failing HWE filter
|
NA
|
6,614
|
1,100
|
2,391
|
23,574
|
4,939
|
Sample filtering
Filter
|
Samples
|
AFR
|
AMR
|
EAS
|
EUR
|
SAS
|
%
|
Initial samples in raw UKBB vcf
|
500,000
|
NA
|
NA
|
NA
|
NA
|
NA
|
100.0
|
Samples after initial filter
|
418,045
|
NA
|
NA
|
NA
|
NA
|
NA
|
83.6
|
Sample call rate
|
5,537
|
NA
|
NA
|
NA
|
NA
|
NA
|
1.1
|
Mean DP
|
0
|
NA
|
NA
|
NA
|
NA
|
NA
|
0.0
|
Mean GQ
|
0
|
NA
|
NA
|
NA
|
NA
|
NA
|
0.0
|
Samples with sex swap
|
269
|
12
|
7
|
1
|
219
|
30
|
0.1
|
Samples after population filters
|
411,824
|
6,601
|
489
|
1,650
|
396,682
|
6,402
|
82.4
|
Within batch Ti/Tv ratio outside 5.9304 median absolute deviations
|
0
|
0
|
0
|
0
|
0
|
0
|
0.0
|
Within batch Het/HomVar ratio outside 5.9304 median absolute deviations
|
0
|
0
|
0
|
0
|
0
|
0
|
0.0
|
Within batch Insertion/Deletion ratio outside 5.9304 median absolute
deviations
|
0
|
0
|
0
|
0
|
0
|
0
|
0.0
|
n singletons > 20 median absolute deviations
|
277
|
0
|
0
|
0
|
277
|
0
|
0.1
|
Samples after final sample filters
|
411,547
|
6,601
|
489
|
1,650
|
396,405
|
6,402
|
82.3
|