Efficiency of the KIR Capture Method
To estimate the efficiency of the capture process, we
mapped all sequence reads generated for each individual
back to the human genome and counted those that
fell within the target coordinates. According to this
measure, the mean enrichment efficiency of the opti-
mized 2
3 300 bp sequencing runs (sample set 3) was
87.01% (SD
¼ 5.01). Because the target region repre-
sents
<0.01% of the human genome, compared to
whole-genome sequencing, this represents a significant
(10,000
3) reduction in the sequencing capacity required
for analyzing the target.
Specificity of Sequence-Read Harvesting
To begin the bioinformatics analysis of KIR, we used a
panel of reference haplotype sequences as filters to har-
vest any sequence reads that could map to the KIR region
from the main pool of sequenced fragments (described in
the
Material and Methods
). Because both the capture
probes and the reference sequences for harvesting reads
were designed with complete KIR haplotypes that did
not have repetitive elements masked, we performed a
further test for specificity on the harvested KIR reads.
By analyzing 70,000 of these read pairs per individual,
we showed that a mean of three read pairs (modal
value
¼ 0) could map outside the target region of human
genome build hg19. Thus, the combination of our cap-
ture/NGS method and KIR sequence-read harvesting is
highly specific. We also note that generating 2
3
100 bp sequence reads (instead of 2
3 300 bp) revealed
that up to 12% of the harvested reads potentially origi-
55.22
55.26
55.30
55.34
55.38
3DL3 2DL3
2DL1
2DL4 3DL1 2DS4
3DL2
3DL3 2DS2
2DS3
2DL4 3DS1 2DL5 2DS5
2DL2 2DL5
2DS1
3DL2
A
B
Chromosome 19 coordinates (Mbp)
0
75
50
25
100
125
Coordinates of the reference haplotype (kbp)
40
80
120
0
A
B
KIR region sequencing gaps
start
size (bp)
feature
LocaƟon
1590
8
(T)n
55202
10
(TA)n
2DL3 intron 4
C
2kbp 5' 3DL3
Read
Depth
2DP1
3DP1
3DP1
Figure 3.
The
KIR Region Is >99.99%
Covered by Sequence Data
(A) Target KIR region on chromosome 19:
the gene locations are shown in orange,
and pseudogenes are shown in gray. The
KIR region varies in gene content, and
shown are examples of two frequent A
and B haplotypes. The KIR prefix is omitted
from the gene names for clarity (see
Appendix A
). The human reference build
hg19 (UCSC Genome Browser) has a KIR
A haplotype. Underneath is a KIR B haplo-
type shown to scale.
(B) Read depth after stringent alignment of
sequence reads (no base pairs mismatched,
and duplicates were removed) from the
PGF cell line to the PGF reference KIR hap-
lotypes 1 (light purple) and 2 (dark purple).
(C) Coordinates and features of two short
gaps in PGF KIR haplotype 2. The location
of the gaps is shown on the right.
nate from repetitive elements outside
the KIR region. However, 100% of
these reads map to a 1.8 kb LINE
insertion that is located in intron 6
of KIR3DL2
13
(
Figure S4
) and does
not overlap with any known control
elements. Thus, these reads do not affect the subsequent
analysis.
Measurement of KIR Gene Copy Number: PING_gc
The PING_gc component of PING specifically determines
gene copy number. KIR3DL3, a single-copy gene common
to all KIR haplotypes,
6,14,16
is used as the standard to which
other KIR genes are compared (
Material and Methods
). To
assess the correlation between read ratio and KIR gene con-
tent, we applied PING_gc to the sequence data generated
from the 97 IHWG samples. The first PING_gc module
(called KFFgc) produced 13 distinct KIR gene presence or
absence genotypes (
Figure 4
A), identical to those obtained
by established methods.
26,59,60
We then applied the sec-
ond PING module, MIRAgc, and used the observed clus-
tering (
Figure 2
) to set threshold values for determining
KIR gene copy numbers (
Figure S3
). To validate these re-
sults, we studied DNA samples from 85 of the same cell
lines by using an established real-time PCR method for
quantifying KIR genes.
12
We observed 99.4% concordance
between the results obtained by PING_gc and those ob-
tained from real-time PCR (
Figure 4
B). Of ten discordant re-
sults from 1,700 determinations, four involved rare alleles
that were not detected by the primers of the real-time PCR
assay (KIR2DL2*009 and KIR3DL2*076, the latter of which
was discovered during this study). Of the other six discor-
dant results, two were due to false positives of the real-time
PCR (as shown independently by a standard PCR
method),
27
two were just below the threshold values of
the real-time PCR (but were clearly positive with PING_gc
and standard PCR), and two remain unexplained
(
Figure 4
B). Thus, the discrepancies were likely due to
The American Journal of Human Genetics 99, 375–391, August 4, 2016
381