low-frequency errors in sensitivity or specificity of the real-
time PCR method. We conclude that analysis of high-
throughput sequencing data with the PING_gc module
provides precise measurement of KIR gene content and
copy number and gives almost 100% accuracy. Using
PING_gc, we identified 26 distinct KIR gene copy-number
A
B
CN type 3DL3 2DS2 2DL2 2DL3 2DL5 2DS3 2DS5 2DP1 2DL1 3DP1 2DL4 3DL1 3DS1 2DS1 2DS4 3DL2
N
1
35
2
13
3
12
4
5
5
4
6
4
7
3
8
2
9
2
10
1
11
1
12
1
13
1
14
1
15
1
16
1
17
1
18
1
19
1
20
1
21
1
22
1
23
1
24
1
25
1
26
1
0
1
2
3
4
Copies of gene
cells tests concordance
real-time PCR error (N)
unexplained
(%)
allele
pos
neg
(N)
85
1700
99.41
4
2
2
2
C
GC type
3DL3
2DS2
2DL2/3
2DL5
2DS3
2DS5
2DP1
2DL1
3DP1
2DL4
3DL1/S1
2DS1
2DS4
3DL2
N
1
1
1
1
1
1
1
1
1
1
36
2
1
1
1
1
1
1
1
1
1
1
1
1
13
3
1
1
1
1
1
1
1
1
1
1
12
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
5
1
1
1
1
1
1
1
1
1
1
1
1
8
6
1
1
1
1
1
1
1
1
1
1
1
1
1
7
7
1
1
1
1
1
1
1
1
1
1
1
1
1
3
8
1
1
1
1
1
1
1
1
1
1
1
2
9
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
12
1
1
1
1
1
1
1
1
1
13
1
1
1
1
1
1
1
1
1
1
1
1
Figure 4.
KIR Gene-Content and Copy-Number Genotypes
(A) Gene-content genotypes derived from all 97 cell lines by the PING pipeline. A black box indicates that a gene is present, and a clear
box indicates the absence of a gene. One example from each observed gene-content genotype (GC type) is shown, and the number
observed is shown on the right.
(B) Independent validation of the KIR copy-number genotypes by real-time PCR
12
on 85 samples. There were four discrepancies due to
alleles undetected by real-time PCR (allele). There were two false positives (pos) and two false negatives (neg) by real-time PCR. Two dis-
crepancies remain unexplained.
(C) Gene copy-number genotypes derived from all 97 cell lines by PING_gc. A colored rectangle indicates the presence of a gene, and the
shades represent the copy number as indicated in the key. One example from each observed gene copy-number genotype (CN type) is
shown, and the number observed is shown on the right.
382
The American Journal of Human Genetics 99, 375–391, August 4, 2016
genotypes in the 97 cell lines analyzed (
Figure 4
C). Two
cells have duplicated KIR3DP1-KIR2DL4-KIR3DL1/S1 seg-
ments (copy-number genotypes 10 and 18;
Figure 4
C),
and three cells have haplotypes lacking KIR2DL4 (geno-
types 12, 13, and 15;
Figure 4
C). In summary, determina-
tion of copy number alone increases the resolution of
KIR genotyping and is an important step toward under-
standing the role of KIR polymorphism in disease.
61
We
next sought to include the allele-calling components in
validation of the PING pipeline in order to achieve full res-
olution of KIR genotypes.
High-Resolution Genotypes of KIR Alleles: PING_allele
PING_allele determines KIR allele genotypes according
to all known KIR coding-sequence alleles (
Material and
Methods
). PING_allele was first validated with whole-
exome data from a sample of 15 KhoeSan individuals.
47
For these individuals, the KIR copy-number and allele
data produced by PING matched the data obtained previ-
ously by the established methods of Sanger sequencing
and pyrosequencing-based genotyping of KIR genes.
16,31
Because KIR3DL1, KIR3DS1, and KIR3DL2 of lineage II
KIR (see
Appendix A
) exhibit high polymorphism and
structural variation,
13,23,62
they were chosen as a further
test of PING_allele. Using the capture/NGS method, we
applied the pipeline to data obtained from 30 family trios
from Mali in West Africa (sample set 2,
Material and
Methods
). In this highly heterozygous population, we
identified 18 KIR3DL1/S1, 15 KIR3DL2, and 3 KIR3DL1/2v
alleles (
Table S1
A). These alleles were authenticated by es-
tablished pyrosequencing and Sanger sequencing methods
(
Material and Methods
), as well as by their segregation in
the trios. KIR3DL1/2v is a KIR3DL1-KIR3DL2 fusion gene
that segregates with KIR3DL1/S1 and encodes a func-
tional protein.
13
Importantly, we correctly identified indi-
viduals with distinct combinations of KIR3DL1/S1 alleles,
KIR3DL1/2v fusion genes, and KIR3DL1/S1-deleted haplo-
types (
Figure 5
). To expand the analysis, we next analyzed
2,112 individuals from the 1000 Genomes dataset.
48
From
their exome sequences, we identified 50 KIR3DL1/S1, 46
KIR3DL2, and 5 KIR3DL1/2v alleles (
Table S1
A), as well as
14 KIR3DL1/S1 duplication and 13 KIR3DL1/S1 deletion
haplotypes (
Table S1
B). Such duplicated and deleted KIR
haplotypes were detected in all 26 populations represented
in the 1000 Genomes dataset (
Table S1
B). These results
demonstrate that the capture/NGS method coupled with
the copy-number and allele components of the PING pipe-
line can correctly identify the extensive and complex vari-
ation of lineage II KIR molecules. The KIR3DL1/S1 and
KIR3DL2 genotypes obtained for all individuals analyzed
from the 1000 Genomes Project are shown in
Table S1
C.
Identification of Novel Alleles by PING
Hereafter, we use ‘‘novel’’ to describe KIR variants that were
previously undiscovered but identified and characterized
by the methods described here. The PING pipeline iden-
tifies such novel alleles by the presence of either one or
more novel SNPs or a novel combination of known SNPs
(
Material and Methods
). To test this ‘‘new allele discovery’’
component of PING, we again used the lineage II KIR
genes. In the course of analyzing the 1000 Genomes
data, we identified 100 novel alleles: 33 KIR3DL1/S1, 65
KIR3DL2, and 2 KIR3DL1/2v alleles. Defining these alleles
are 88 novel SNPs (39 in KIR3DL1/S1 and 49 in KIR3DL2;
Tables
S2
A and
S2
B) and 17 novel combinations of known
SNPs (
Table S2
C). Sequences of all the novel alleles were
validated by standard methods: PCR amplification from
genomic DNA of source material and subsequent cloning
and/or Sanger sequencing (
Material and Methods
). Of
the 2,112 individuals studied, 229 (10.8%) have at least
one novel lineage II KIR allele (
Table S1
C). A total of 333
different KIR3DL1/S1-KIR3DL2 haplotypes were identified
in this analysis (
Table S1
C).
High-Resolution KIR Allele and Copy-Number Genotypes
We applied PING_allele to the KIR sequence data obtained
from the 97 IHWG cells. This analysis of 13 KIR genes iden-
tified 144 different KIR sequences: 128 corresponding to
established alleles and 16 representing novel alleles. The
latter were all shown to be authentic by the standard
methods described above for KIR3DL1/S1 and KIR3DL2
(
Table S2
D). By considering all 144 KIR variants, we identi-
fied a minimum of 104 centromeric and 42 telomeric KIR
C
F
M
C
F
M
C
F
M
C
F
M
1
4
3
2
*03101
*00401
*001
*00301
*03101
*068
*001
*008
*00401
*024N
*00301
*001
*007
*059
*008
*007
*01501
*008
*013
*059
*01501
*013
*00401
*00301
*006
*00401
*01502
*00301
*001
*01701
*006
*010
*059
*006
*068
*006
*008
*059
*068
*008
Family
KIR3DL1/S1
KIR3DL2
Figure 5.
High-Resolution Allele-Level Genotyping of KIR Genes
Four examples of high-resolution allele and copy-number geno-
types of lineage II KIR genes and their segregation in family trios:
C, child; F, father; and M, mother. Colored boxes show the segre-
gating alleles. All members of family 1 have two alleles each of
KIR3DL1/S1 and KIR3DL2. Family 2 shows segregation of the
KIR3DL1/2v fusion gene (the allele named KIR3DL1*059), which
consists of exons 1–6 from KIR3DL1 and 7–9 from KIR3DL2.
13,63
For clarity, KIR3DL1/2v is shown as an allele of KIR3DL1, so there
is no allele of KIR3DL2 on this haplotype. Family 3 shows segrega-
tion of a haplotype that lacks KIR3DL1/S1 and is marked by the
presence of KIR3DL2*006. The gene copy numbers were deter-
mined by PING_gc, which indicated that one copy of KIR3DL1
is present in each of individuals 3C and 3M and two copies are
present in 3F. Family 4 shows segregation of both the KIR3DL1/
2v and the KIR3DL1 negative haplotypes to the child.
The American Journal of Human Genetics 99, 375–391, August 4, 2016
383