HLA class I gene sequence in the modern human popula-
tion.
64
Further demonstrating our method’s robust capac-
ity to target divergent sequences, we successfully captured
and sequenced all alleles of Patr-A, Patr-B, and Patr-C
(
Table S4
E), the chimpanzee orthologs of HLA-A, HLA-B,
and HLA-C, respectively, from the chimpanzee that was
the subject of the chimpanzee genome project.
44
In sum-
mation, the breadth and depth of our results give confi-
dence that our method will be able to capture the full range
of HLA class I alleles.
Discussion
We developed an integrated capture/NGS method to
characterize completely the structure and sequence of
the highly polymorphic KIR and HLA class I genes. The
approach enables a focused and extensive definition of
this physiologically important variation, which is not
possible with any other single method. Our method is
also well suited for genotyping the large cohorts required
for insightful study of population genetics and disease as-
sociation, as well as donor selection for clinical transplan-
tation. All components of the method were validated with
panels of DNA samples that represent the observed range
of human variation for these complex genomic regions.
Both the method and the information we have obtained
during its development should prove valuable resources
for future studies.
We used DNA from well-characterized immortalized hu-
man B cell lines as reference materials during the design
and optimization of the laboratory and bioinformatics
methods. These panels of IHWG and 1000 Genomes cell
lines are generally available for other researchers (see
Web Resources
). For validation, we focused on lineage II
KIR genes because they exhibit some of the most extreme
and complex genomic variation within the human KIR lo-
cus. We identified and distinguished deletions and duplica-
tions of KIR3DL1/S1 and the presence of the KIR3DL1/2v
fusion gene and also defined the alleles of these genes.
This was achieved by a combination of quantitative assess-
ment of read depth, virtual sequence probing, and refer-
ence alignment. Such independent verification is critical
for characterizing structural KIR variants, which are not de-
tected by methods that depend only on the alignment of
sequence reads to reference haplotypes.
65,66
In summary,
the validation experiments demonstrate our method to
be robust and capable of detecting the full range of KIR
genomic variation.
0
200
400
600
1
1000
2000
3000
0
100
200
300
0
1000
2000
3000
A*24:02:01:01
B*13:02:01
B*48:01:01
C*06:02:01:01
C*08:01:01
reference sequence coordinates (bp)
Read
depth
Read
depth
Read
depth
B
A
0
100
200
300
0
1000
2000
3000
HLA-A
HLA-B
HLA-C
A*01
B*07
B*48
C*01
A*02
B*08
B*49
C*02
A*03
B*13
B*50
C*03
A*11
B*14
B*51
C*04
A*23
B*15
B*52
C*05
A*24
B*18
B*53
C*06
A*25
B*27
B*54
C*07
A*26
B*35
B*55
C*08
A*29
B*37
B*56
C*12
A*30
B*38
B*57
C*14
A*31
B*39
B*58
C*15
A*32
B*40
B*78
C*16
A*33
B*41
C*17
A*34
B*42
C*18
A*36
B*44
A*66
B*45
A*68
B*46
A*74
B*47
Figure 7.
Capture of HLA Class I Genes for High-Resolution
Allele Genotyping
(A) Shown is the read depth across each of the HLA class I genes
from a representative sample (chosen by virtue of having a read
number closest to the mean number of HLA-specific reads). Green
lines indicate the coordinates of the exons that were covered. To
generate this figure, we obtained full gene sequences (~3 kb
each) from IPD to represent all five HLA class I alleles known
to be present in this sample (the sample is homozygous for a
common allele of HLA-A). Sequence reads were filtered to be spe-
cific to HLA-A, HLA-B, and HLA-C and then aligned to these refer-
ences with high stringency. The read depth was measured with
SAMtools/BCFtools.
(B) The major HLA class I allele types detected in this study.
The American Journal of Human Genetics 99, 375–391, August 4, 2016
385
With few exceptions, studies of KIR in human popula-
tions and disease cohorts have analyzed KIR gene content,
but not allelic diversity.
67,68
Such studies were seminal for
showing how KIR genomic diversity can shape the im-
mune response and provide resistance to disease.
69
Studies
of gene content also uncovered the influence of KIR diver-
sity on the success of reproduction and bone marrow
transplantation.
9,11
The few studies that have focused on
specific KIR genes and their allelic diversity and copy
number have refined these disease associations and impli-
cated specific alleles.
61,70
In the course of validating our
method, we identified and characterized 116 novel KIR
alleles. This knowledge of KIR polymorphism makes sub-
stantial contributions to the KIR database.
15
For example,
the number of KIR3DL2 alleles was doubled and is now
in excess of 100. We also show that 476 (22.5%) of the
1000 Genomes individuals have at least one example of a
structural variant or novel allele of KIR3DL1/S1 or KIR3DL2
(
Table S1
C). All of these genomic variations have potential
to influence NK cell function, but they are not visible to
typing at the level of KIR gene content. A strong case can
therefore be made that high-resolution knowledge of KIR
diversity, in all its forms, will identify additional disease as-
sociations and improve the understanding of those already
known.
The study of human populations and their evolutionary
dynamics, ancestry, and disease has benefited from GWAS
methods, which genotype numerous SNP markers in large
cohorts of individuals. Such analysis of the KIR region has
been impractical because its extraordinary structural diver-
sity leaves few locations suitable for designing binary
SNP markers, and many of the KIR genotyping results fail
routine quality-control filters. Thus, the ‘‘immunochip,’’
which focuses on immune-system genes
71
and has refined
the role of HLA-associated diseases, includes relatively few
informative SNPs in the KIR region. These SNPs are located
in KIR3DL3, KIR2DL4, KIR3DL1/S1, and KIR3DL2, which
were previously assumed to be present in one copy on
every haplotype.
14
Our study demonstrates that this is
not the reality. In more than 10% of 1000 Genomes indi-
viduals, one of these four KIR genes is deleted, duplicated,
or part of a fusion gene. We conclude that genotyping SNPs
within the KIR locus by using standard binary measure-
ment is of little practical value.
To compensate for the absence of suitable SNPs within
the KIR genes, a recently described imputation method
should accurately re-assess the diversity of KIR gene con-
tent for many of the reported GWASs.
28
Imputation of
HLA class I alleles from GWAS data has been informative
for studies of immune-mediated diseases.
72
Imputation of
HLA alleles varies in its efficiency, particularly in non-Euro-
pean individuals, partly because
>10,000 alleles are
described but also because imputation relies on linkage
disequilibrium, which can extend for shorter genomic
tracts in non-Europeans than in Europeans.
15,72
Many
of the KIR variants and polymorphisms identified by
our method are not evenly distributed across human
populations. For example, the KIR3DL1/2v fusion gene
is restricted to Africans, who exhibit the lowest linkage
disequilibrium worldwide. Thus, it is unlikely that imputa-
tion will be able to resolve all of the structural and allelic
diversity of KIR.
We employed short-read technology because of its high
fidelity. Pressing this point, all of the novel SNPs identified
by our method were confirmed by independent and well-
established sequencing methods. The capture method we
used will probably soon be adapted to obtain longer frag-
ment sizes and read lengths, which should become increas-
ingly valuable as the sequencing error rates decrease.
73
Because we are able to capture and sequence the chim-
panzee KIR region, our method most likely captures the
extent of human KIR diversity. Thus, there is limited
allelic dropout. Alternative methods that do not suffer
allelic dropout are whole-genome approaches. Because
our method targets large numbers of individuals and has
a low impact on sequencing instruments and reagent re-
sources, our assay provides an economic and practically
viable alternative to whole-genome experiments. We also
note that our bioinformatics pipeline can obtain accurate
KIR genotypes from any whole-genome sequencing exper-
iments of sufficient mean depth. The pipeline is also de-
signed for application to any highly polymorphic gene sys-
tem. Our approach is designed to genotype very large
numbers of individuals while having a low impact on com-
puter resources. In these properties, it differs from the pop-
ulation-graph method of allele designation that has been
applied to HLA.
74
However, this method could be a valu-
able complement to our methods if it is also applied to KIR.
The first KIR cDNA sequences were reported in the late
1990s.
75,76
This led to research that revealed the unantici-
pated scope of genetic complexity and diversity of the
human KIR gene family.
27
The method we describe here
will facilitate determination of a complete description of
KIR variation in the human population, its interaction
and co-evolution with HLA class I, and its influence on
physiology, disease, and immunotherapy.
Appendix A: KIR Nomenclature
Throughout this paper, any unique DNA sequence that
spans a coding region (otherwise known as a CDS) is
considered a distinct allele. Those alleles that encode a
unique protein sequence define an allotype. KIR genes
and alleles are named by the KIR Nomenclature Commit-
tee, formed from members of the WHO Nomenclature
Committee for factors of the HLA system and members
of the HUGO Genome Nomenclature Committee.
15
The
IPD-KIR database is part of the IPD, which is listed in the
Web Resources
section.
In the nomenclature for alleles, the digit (2 or 3) and let-
ter (D) following the KIR prefix indicate whether two (2D)
or three (3D) immunoglobulin (Ig)-like domains are pre-
sent in the encoded protein. After the letter D is another
386
The American Journal of Human Genetics 99, 375–391, August 4, 2016