an immune response. On infected or transformed cell sur-
faces, pathogen-specific or tumor-specific peptides are
bound to HLA class I, and gross changes in the surface level
of HLA class I can be induced. All such differences activate
lymphocytes and the immune response.
4,19
For the inter-
actions between KIRs and HLA class I molecules to be effec-
tive, they have to respond to a wide diversity of tumors and
pathogens, many of which are rapidly evolving.
20
This has
been achieved with a diversity of interactions within each
individual and differences in those interactions from one
individual to another. The latter provides barriers that
can impede the spread of infection within families, com-
munities, and populations.
Crucial features that distinguish KIR and HLA alleles
from those of most other genes are the depth, breadth,
and functional importance of their sequence divergence.
Thus, alleles can differ by multiple nucleotide substitu-
tions, and three or four alternative nucleotides are present
at functionally critical positions. KIR and HLA alleles
segregate as constituents of distinct lineages, which are
further diversified by intra-genic and inter-genic recombi-
nation.
13,21
In turn, these lineages are maintained in all
human populations, and both genomic regions exhibit
clear evidence of the impact of balancing selection.
22,23
Moreover, the strong, highly reproducible signals of natu-
ral selection observed for the HLA class I and KIR regions
suggest that their genomic variation is critical for human
survival.
24,25
The development of methods for assessing the nature
and extent of KIR genomic diversity has been limited by
the complexity of the region. The widely used methods
that exist for typing KIRs focus principally on gene con-
tent.
12,26–28
In contrast, the methods being used for deter-
mining allelic variation are costly, time consuming,
6,16,29
and unsuitable for high-throughput studies. The results
of the few allele-level population studies of KIRs,
16,29–32
however, show that such investigation is likely to be infor-
mative. For example, some KIRs are restricted to popula-
tion groups of specific geographic ancestry.
30,31
Other
KIRs have lost expression but appear common and widely
distributed.
29,32
To extend such studies to other popula-
tions, as well as disease cohorts, we have developed a
sequencing and bioinformatics method that determines
complete KIR and HLA class I genomic diversity.
Material and Methods
Overview
To target KIR and HLA class I genes for next-generation nucleotide
sequencing (NGS), we designed sets of specific oligonucleotide
probes to capture the KIR region (140–240 kb) and HLA-A,
HLA-B, and HLA-C (each ~3 kb) from libraries prepared from
sheared genomic DNA. We then developed a bioinformatics pipe-
line (PING [Pushing Immunogenetics to the Next Generation])
specifically to convert sequence data obtained from the highly
polymorphic KIR genes into high-resolution genotypes. A sum-
mary of the pipeline is shown in
Figure 1
A. PING first sorts the
sequence reads to isolate those that represent fragments from
the KIR genomic region from those that do not (a process termed
filtering). PING then obtains the final KIR genotypes from these
filtered reads by using a composite of two core modules that
describe the gene and allele content for each individual and also
return information on newly identified SNPs and recombinant al-
leles. The first module (PING_gc), which determines the KIR gene
copy number, is used to inform the second module (PING_allele),
which generates allele data (
Figure 1
A and
Figure S1
). Each module
is split into two sub-modules. KIR Filter Fish (KFF), which is used
in both main modules, probes the KIR sequence data with specific
sequence search strings and determines which genes (KFFgc) or
alleles (KFFallele) are present. The function served by KFF is
equivalent to genotyping with sequence-specific oligonucleotide
probes (SSOPs).
33
To complement KFF, MIRAgc (based around
the program MIRA)
34
and Son of SAMtools (SOS; based around
SAMtools)
35
create alignments to reference sequences in order to
determine the gene and allele content, respectively. The output
is designed to comply with the genotype list (GL string) format
that is used for reporting HLA and KIR data by clinical transplan-
tation laboratories.
36
We validated the typing obtained from the
complete capture, NGS, and bioinformatics method (hereafter
referred to as the capture/NGS method) by using standard molec-
ular techniques, and we further tested the bioinformatics compo-
nent by using existing datasets from whole-genome sequencing
experiments. A summary of the data generated or otherwise
obtained is shown in
Figure 1
B. KIR and HLA class I allele
sequences used for probe design and as reference data were
obtained from the Immuno Polymorphism Database (IPD; see
Web Resources
).
15
Throughout this paper, any unique DNA
sequence that spans a coding region (coding DNA sequence
[CDS]) is considered a distinct allele. An explanation of KIR and
HLA nomenclature is given in
Appendix A
.
Human Subjects and Data
Ethical approval for this study was obtained from the Stanford
University Administrative Panels on Laboratory Care and Human
Subjects in Medical Research and the Committee on Human
Research at the University of California, San Francisco. Written
informed consent was obtained from all individuals.
To develop and validate the capture/NGS method, we generated
data from three sources of human genomic DNA:
1. A Panel of IHWG Lymphoblastoid B Cell Lines. Genomic DNA
was extracted from 97 International Histocompatibility
Working Group (IHWG) cell lines. These cells have been
used extensively in developing methods for genotyping
polymorphic loci, including KIR and HLA.
37–41
Most of
the cell lines (93%) are homozygous for HLA-A, HLA-B,
and HLA-C.
37–41
A substantial majority of the IHWG cells
(80%) are derived from donors of European origin and repre-
sent many of the common HLA alleles.
42,43
Also studied was
genomic DNA from a chimpanzee B cell line, derived from
Clint
44
(Yerkes pedigree number C0471), a chimpanzee of
the Pan troglodytes verus (western chimpanzee) subspecies
and subject of the chimpanzee genome project.
44
2. West African Trios. Genomic DNA samples from 30 family
trios (both of the parents and one child) from Mali in
West Africa were analyzed.
45
3. European Control Samples. De-identified DNA samples from
188 unrelated healthy individuals of European origin,
376
The American Journal of Human Genetics 99, 375–391, August 4, 2016