with no history of chronic disease, were studied. These sam-
ples were selected at random from a larger dataset (n
¼ 500)
developed as controls for genome-wide association studies
(GWASs) of multiple sclerosis (MIM: 126200).
46
These sam-
ples were used because their high-resolution HLA-A, HLA-B,
and HLA-C genotypes had been determined by Sanger
sequencing.
46
To validate the PING pipeline, we used existing sequence-
read data from an additional two sources, described below.
To extract KIR-specific sequences from these datasets, we used
SAMtools 0.1.18.
35
to identify read pairs that mapped within
the KIR region (hg19 coordinates: 19:55,228,188–55,383,188)
KIR
copy number
and
allele
genotype
KFFallele
Select probes
Generate hit pattern
Decode pattern
SOS
Sort reads per KIR
Make sam/bam
Final align - vcf
New SNPs
New combinations
(manual check)
KFFgc
Gene-specific probes
Per-sample probe hits
+/- genotype
MIRAgc
Align vs. reference
Count reads per gene
KIR gene +/-
Copy number
KIR
Reads
PING_gc
PING_allele
Filter
Call alleles
KIR alleles
Whole-Genome Sequencing
Target Enrichment + NGS
All sequence reads
Unmapped sequence reads
Mapped KIR sequence reads
KIR +
Copy number
A
B
Call alleles
Data
To
Generated
Validate
ValidaƟon
Individuals
N
(used)
Genotype
Method
1.
IHWG cell lines
97 Enrichment/NGS
KIR +/-
PCR
KIR copy number
Real Ɵme PCR
KIR alleles
Pyrosequencing
KIR novel alleles
Cloning/Sanger sequencing
HLA alleles
PCR-SSOP, and IHWG database
2. West African Trios
90 Enrichment/NGS
KIR alleles
Pyrosequencing
KIR copy number
SegregaƟon
HLA alleles
PCR-SSOP
3. Europeans
188 Enrichment/NGS
HLA alleles
PCR-SSOP, and Sanger sequencing
4a. KhoeSan
15
(Exome)
KIR alleles
Pyrosequencing
4b. 1000 Genomes
2,112
(Exome)
KIR novel alleles
Cloning/Sanger sequencing
Figure
1.
Pipeline
for
Analyzing
Sequence Data from Highly Polymorphic
and Structurally Divergent KIR Haplotypes
(A) The PING (Pushing Immunogenetics
to the Next Generation) pipeline has two
broad arms and two modules. The first
module (PING_gc) determines KIR gene
copy numbers, and the second module
(PING_allele)
determines
their
alleles.
Within each module are two arms. The first
arm (KFFgc and KFFallele) is an analysis in-
dependent of any alignment or assembly
and uses virtual probes to mine the raw
data. The second arm (MIRAgc and SOS)
performs filtering and alignment of reads
to reference sequences. Thus, copy-num-
ber and allele genotypes are each derived
by two independent methods. The tech-
niques are described fully in the
Material
and Methods
.
(B)
Data
generated
by
the
method
described herein (1–3) and those obtained
from other sources (4). The table shows
the number of individuals or cell lines
used for validation, the genotyping results
to be validated, and the independent labo-
ratory method used for this purpose.
or an unallocated chromosome 19 region
(GenBank: GL000209.1) corresponding to
an alternative KIR haplotype.
4. 15 KhoeSan individuals.
47
These indi-
viduals had also been genotyped
for KIR genes via standard lower-
throughput
methods
of
pyrose-
quencing and Sanger sequencing.
16
5. The 1000 Genomes Project data.
48
All 2,532 of the whole-exome-
sequenced
individuals
described
in the May 2013 release were tar-
geted.
48
To ensure sufficient quan-
tity of sequence reads for the anal-
ysis, we excluded samples if fewer
than 25 reads mapped to exon 3
of KIR3DL2 or KIR3DL1/2v (see
Ap-
pendix A
). 420 of the 1000 Genomes
samples were excluded on this basis,
and the remaining 2,112 were geno-
typed. When previously uncharac-
terized KIR alleles were identified in the 1000 Genomes
dataset, genomic DNA from the source samples was pur-
chased from the Coriell Biorepository for confirmation of
their sequence by standard molecular methods.
Laboratory Methods
Design of Capture Oligonucleotide Probes
To account for variation in the gene content of the KIR region, we
targeted a panel of independently generated reference KIR haplo-
types
6,14
that together represent all of the 13 recognized KIR genes.
First, we designed probes against the two complete KIR haplotypes
(GenBank: FP089703 and FP089704) that were generated from the
PGF cell line, which was the source of the human reference
The American Journal of Human Genetics 99, 375–391, August 4, 2016
377
sequence for the KIR region.
6
We used end-to-end tiling with
strand swapping to design non-overlapping 80-mer probes to
match these reference sequences. We then designed a similar set
of probes by using a further 27 complete KIR haplotype se-
quences
6,14
and all KIR sequences included in the January 2013
release of the IPD-KIR database.
15
In this second stage, probes
that differed by more than three nucleotides from the correspond-
ing segment of the initial reference haplotypes were selected
for use. We did not mask any repetitive elements in the target
haplotypes. The KIR genomic region targeted by the probes is
equivalent to that covered by chr19: 55,228,188–55,383,188
(UCSC Genome Browser hg19) and an unmapped chromosome
19 region (GenBank: GL000209.1), which are the two KIR haplo-
types present in the hg19 reference genome. In a similar manner
to the KIR probes, we designed probes against the alleles of the
classic HLA class I genes present in the PGF cell line, which was
also the source of the human reference sequence for the HLA re-
gion.
1,14
These probes were supplemented with probes designed
against the 6,795 HLA class I sequences reported in the January
2013 release of the IPD-HLA database.
15
A total of 10,456 capture
probes were used.
Preparation of Biotinylated Capture Probes
The set of capture oligonucleotides, each one comprising a unique
sequence flanked by the common sequences 5
0
-GGTGATTGCG
TATCT-3
0
(PTL3) and 5
0
-CATGTCGTGGGAATT-3
0
(PTR3), was syn-
thesized by CustomArray. This set of oligonucleotides was pooled
and amplified in a single PCR using primers with sequences corre-
sponding to PTL3 and PTR3. The PCR comprised 1
3 Titanium Taq
buffer (Clontech), 1
mM each of biotin-PTL3 and -PTR3 primers
(Integrated DNA Technologies), 0.2
mM dNTPs with 12.5% dUTP
(Roche), 1
mL (1 unit) Uracil-DNA Glycosylase (UDG; New England
Biolabs), 1 M betaine, 3
mL (15 units) AmpliTherm Polymerase
(Epicenter), 0.2 ng of the pool of capture oligonucleotides, and
H
2
O added to create a final volume of 100
mL. PCR cycling condi-
tions were as follows: 37
C for 10 min, 95
C for 3 min, 95
C for
30 s, 55
C for 30 s, 72
C for 30 s (
328), 72
C for 10 min, and a
hold at 10
C.
The biotinylated PCR product (100
mL aliquot) was then bound
to streptavidin-coated magnetic beads (Illumina) that had been
pre-washed with 100
mL 63 hybridization buffer (HB: 1 M NaCl,
0.5 M phosphate buffer, and 0.05% Tween-20) and suspended
in 100
mL 123 HB. The incubation was carried out for 30 min
at room temperature in HB and with agitation. The beads, now
coated with biotinylated oligonucleotides, were then washed:
once with 100
mL 63 HB, twice with 100 mL 0.23 HB, once with
100
mL 0.1 nM NaOH and 100 mL 10 mM EDTA, and lastly, once
with 100
mL 0.23 HB. Biotinylated oligonucleotides were eluted
from the beads with 0.1 mM EDTA and then concentrated via
speed vacuum to a final concentration of 2.5 nM for each capture
probe.
Library Preparation, Enrichment, and Sequencing
The protocol was based on the TruSeq Nano method for library
preparation (Illumina). The DNA samples we used are described
below. For each sample, 300 ng genomic DNA (as determined by
Qubit instrument, Thermo Fisher Sceintific) was sheared into
800 bp fragments with a Covaris S220 instrument (Covaris). The
library preparation was then performed according to the manufac-
turer’s instructions, whereby 96 unique ‘‘dual index’’ combina-
tions were used individually to label the library obtained from
each DNA sample, and the following modifications: (1) for clean-
ing and size selecting the samples after end repair, 70.2
mL sample
purification beads plus 89.8
mL H
2
0 were added to a 100
mL sample,
and (2) in the final PCR, the 72
C extension time was changed
from 30 to 90 s to account for the 800 bp fragment length.
Enrichment of HLA and KIR sequences was performed according
to a modified version of the Nextera Rapid Capture Exome enrich-
ment protocol (Illumina), a solution-based target-capture assay.
The libraries of genomic DNA, indexed uniquely for each sample
as described above, were pooled prior to their hybridization with
the capture probes. Thus, each hybridization mix (100
mL) con-
tained 96 uniquely indexed sequence libraries (62.5 ng for each li-
brary and 6,000 ng in total), 50 pM of each biotinylated capture
probe, and HB (CT3 and all subsequent buffers from Illumina).
The hybridization mix was incubated at 95
C for 10 min, gradu-
ally cooled by 2
C/min to 58
C, and then maintained at 58
C
for 90 min. In this reaction, fragments of genomic DNA that con-
tained targeted KIR and HLA sequences became specifically hy-
bridized to biotinylated capture probes.
In the next reaction, 100
mL of streptavidin-coated magnetic
beads were used to separate the specific hybridized genomic
DNA away from the non-specific un-hybridized genomic DNA.
The biotin present in hybrid DNA molecules was bound to strep-
tavidin on the beads, leaving the non-specific DNA in solution.
The DNA preparation enriched with the targeted KIR and HLA
genes was then eluted from the beads. Binding of the hybridiza-
tion product to the beads was achieved by 30 min incubation
with agitation at 1,000 rpm on a plate shaker at room temperature.
To clean the product, we removed the streptavidin beads from
solution by using a magnetic separator, mixed them with 200
mL
Enrichment Wash Solution (Illumina), and incubated them at
50
C for 30 min. This wash step was repeated. To elute the
enriched DNA from the beads, we added 23
mL of elution mix
(made from 1.5
mL 2M NaOH plus 28 mL Elution Buffer 1, Illu-
mina), incubated for 5 min at room temperature and neutralized
with 4
mL Elute Target Buffer 2 (Illumina). The eluted material
was then subjected to a second round of enrichment from the
hybridization step onward. After the gradual cooling step, the hy-
bridization mix was maintained at 58
C for 14–18 hr.
An aliquot of 10
mL of the enriched DNA preparation was subject
to PCR amplification in a 50
mL reaction mix containing 5 mL of a
PCR primer cocktail, 15
mL of resuspension buffer, and 20 mL of
Nextera Enrichment Amplification Mix. PCR cycling was per-
formed as follows: 98
C for 30 s; 17 cycles of 98
C for 10 s, 60
C
for 30 s, and 72
C for 30 s; and a final elongation step at 72
C for
5 min. Amplified material was purified with 40
mL of sample purifi-
cation beads and eluted in 30
mL resuspension buffer (Illumina).
NGS Strategies
Set 1: IHWG Cell Lines. The enriched libraries were sequenced with
a HiSeq 2000 instrument and sequencing chemistry (Illumina).
Samples were clustered and paired-end sequencing was performed
with the TruSeq SBSv3-HS Kit (Illumina). The sequencing read
length was 2
3 101 bp.
Set 2: Trios and Chimpanzee. The enriched libraries obtained from
these samples were sequenced with a HiSeq 2500 instrument and
sequencing chemistry (Illumina). The sequencing read length was
2
3 250 bp. These samples were also genotyped for HLA-A, HLA-B,
and HLA-C with SSOPs and for KIR3DL1 and KIR3DL2 by pyrose-
quencing
23
(see
Appendix A
).
Set 3: European Control Samples. These samples were analyzed
with a MiSeq instrument (Illumina) with V3 chemistry, and the
sequencing read length was 2
3 300 bp.
Enrichment Efficiency
We estimated enrichment efficiency by mapping unprocessed
sequence reads to the human reference sequence (hg19) with
378
The American Journal of Human Genetics 99, 375–391, August 4, 2016