Defining kir and hla class I genotypes at Highest Resolution via High-Throughput Sequencing

Yüklə 448,46 Kb.

Pdf görüntüsü

səhifə	3/12
tarix	15.03.2018
ölçüsü	448,46 Kb.
	#31858

1 2 3 4 5 6 7 8 9 ... 12

with no history of chronic disease, were studied. These sam-

ples were selected at random from a larger dataset (n

¼ 500)

developed as controls for genome-wide association studies

(GWASs) of multiple sclerosis (MIM: 126200).

These sam-

ples were used because their high-resolution HLA-A, HLA-B,

and HLA-C genotypes had been determined by Sanger

sequencing.

To validate the PING pipeline, we used existing sequence-

read data from an additional two sources, described below.

To extract KIR-speciﬁc sequences from these datasets, we used

SAMtools 0.1.18.

to identify read pairs that mapped within

the KIR region (hg19 coordinates: 19:55,228,188–55,383,188)

KIR

copy number

and

allele

genotype

KFFallele

Select probes

Generate hit pattern

Decode pattern

SOS

Sort reads per KIR

Make sam/bam

Final align - vcf

New SNPs

New combinations

(manual check)

KFFgc

Gene-specific probes

Per-sample probe hits

+/- genotype

MIRAgc

Align vs. reference

Count reads per gene

KIR gene +/-

Copy number

KIR

Reads

PING_gc

PING_allele

Filter

Call alleles

KIR alleles

Whole-Genome Sequencing

Target Enrichment + NGS

All sequence reads

Unmapped sequence reads

Mapped KIR sequence reads

KIR +

Copy number

A

B

Call alleles

Data

To

Generated

Validate

ValidaƟon

Individuals

(used)

Genotype

Method

1. IHWG cell lines

97 Enrichment/NGS

KIR +/-

PCR

KIR copy number

Real Ɵme PCR

KIR alleles

Pyrosequencing

KIR novel alleles

Cloning/Sanger sequencing

HLA alleles

PCR-SSOP, and IHWG database

2. West African Trios

90 Enrichment/NGS

KIR alleles

Pyrosequencing

KIR copy number

SegregaƟon

HLA alleles

PCR-SSOP

3. Europeans

188 Enrichment/NGS

HLA alleles

PCR-SSOP, and Sanger sequencing

4a. KhoeSan

(Exome)

KIR alleles

Pyrosequencing

4b. 1000 Genomes

2,112

(Exome)

KIR novel alleles

Cloning/Sanger sequencing

Figure

Pipeline

for

Analyzing

Sequence Data from Highly Polymorphic

and Structurally Divergent KIR Haplotypes

(A) The PING (Pushing Immunogenetics

to the Next Generation) pipeline has two

broad arms and two modules. The ﬁrst

module (PING_gc) determines KIR gene

copy numbers, and the second module

(PING_allele)

determines

their

alleles.

Within each module are two arms. The ﬁrst

arm (KFFgc and KFFallele) is an analysis in-

dependent of any alignment or assembly

and uses virtual probes to mine the raw

data. The second arm (MIRAgc and SOS)

performs ﬁltering and alignment of reads

to reference sequences. Thus, copy-num-

ber and allele genotypes are each derived

by two independent methods. The tech-

niques are described fully in the

Material

and Methods

(B)

Data

generated

the

method

described herein (1–3) and those obtained

from other sources (4). The table shows

the number of individuals or cell lines

used for validation, the genotyping results

to be validated, and the independent labo-

ratory method used for this purpose.

or an unallocated chromosome 19 region

(GenBank: GL000209.1) corresponding to

an alternative KIR haplotype.

4. 15 KhoeSan individuals.

These indi-

viduals had also been genotyped

for KIR genes via standard lower-

throughput

methods

pyrose-

quencing and Sanger sequencing.

5. The 1000 Genomes Project data.

All 2,532 of the whole-exome-

sequenced

individuals

described

in the May 2013 release were tar-

geted.

48

To ensure sufﬁcient quan-

tity of sequence reads for the anal-

ysis, we excluded samples if fewer

than 25 reads mapped to exon 3

of KIR3DL2 or KIR3DL1/2v (see

Ap-

pendix A

). 420 of the 1000 Genomes

samples were excluded on this basis,

and the remaining 2,112 were geno-

typed. When previously uncharac-

terized KIR alleles were identiﬁed in the 1000 Genomes

dataset, genomic DNA from the source samples was pur-

chased from the Coriell Biorepository for conﬁrmation of

their sequence by standard molecular methods.

Laboratory Methods

Design of Capture Oligonucleotide Probes

To account for variation in the gene content of the KIR region, we

targeted a panel of independently generated reference KIR haplo-

types

6,14

that together represent all of the 13 recognized KIR genes.

First, we designed probes against the two complete KIR haplotypes

(GenBank: FP089703 and FP089704) that were generated from the

PGF cell line, which was the source of the human reference

The American Journal of Human Genetics 99, 375–391, August 4, 2016

377

sequence for the KIR region.

We used end-to-end tiling with

strand swapping to design non-overlapping 80-mer probes to

match these reference sequences. We then designed a similar set

of probes by using a further 27 complete KIR haplotype se-

quences

6,14

and all KIR sequences included in the January 2013

release of the IPD-KIR database.

In this second stage, probes

that differed by more than three nucleotides from the correspond-

ing segment of the initial reference haplotypes were selected

for use. We did not mask any repetitive elements in the target

haplotypes. The KIR genomic region targeted by the probes is

equivalent to that covered by chr19: 55,228,188–55,383,188

(UCSC Genome Browser hg19) and an unmapped chromosome

19 region (GenBank: GL000209.1), which are the two KIR haplo-

types present in the hg19 reference genome. In a similar manner

to the KIR probes, we designed probes against the alleles of the

classic HLA class I genes present in the PGF cell line, which was

also the source of the human reference sequence for the HLA re-

gion.

1,14

These probes were supplemented with probes designed

against the 6,795 HLA class I sequences reported in the January

2013 release of the IPD-HLA database.

A total of 10,456 capture

probes were used.

Preparation of Biotinylated Capture Probes

The set of capture oligonucleotides, each one comprising a unique

sequence ﬂanked by the common sequences 5

-GGTGATTGCG

TATCT-3

(PTL3) and 5

-CATGTCGTGGGAATT-3

(PTR3), was syn-

thesized by CustomArray. This set of oligonucleotides was pooled

and ampliﬁed in a single PCR using primers with sequences corre-

sponding to PTL3 and PTR3. The PCR comprised 1

3 Titanium Taq

buffer (Clontech), 1

mM each of biotin-PTL3 and -PTR3 primers

(Integrated DNA Technologies), 0.2

mM dNTPs with 12.5% dUTP

(Roche), 1

mL (1 unit) Uracil-DNA Glycosylase (UDG; New England

Biolabs), 1 M betaine, 3

mL (15 units) AmpliTherm Polymerase

(Epicenter), 0.2 ng of the pool of capture oligonucleotides, and

O added to create a ﬁnal volume of 100

mL. PCR cycling condi-

tions were as follows: 37

C for 10 min, 95

C for 3 min, 95

C for

30 s, 55

C for 30 s, 72

C for 30 s (

328), 72

C for 10 min, and a

hold at 10

The biotinylated PCR product (100

mL aliquot) was then bound

to streptavidin-coated magnetic beads (Illumina) that had been

pre-washed with 100

mL 63 hybridization buffer (HB: 1 M NaCl,

0.5 M phosphate buffer, and 0.05% Tween-20) and suspended

in 100

mL 123 HB. The incubation was carried out for 30 min

at room temperature in HB and with agitation. The beads, now

coated with biotinylated oligonucleotides, were then washed:

once with 100

mL 63 HB, twice with 100 mL 0.23 HB, once with

100

mL 0.1 nM NaOH and 100 mL 10 mM EDTA, and lastly, once

with 100

mL 0.23 HB. Biotinylated oligonucleotides were eluted

from the beads with 0.1 mM EDTA and then concentrated via

speed vacuum to a ﬁnal concentration of 2.5 nM for each capture

probe.

Library Preparation, Enrichment, and Sequencing

The protocol was based on the TruSeq Nano method for library

preparation (Illumina). The DNA samples we used are described

below. For each sample, 300 ng genomic DNA (as determined by

Qubit instrument, Thermo Fisher Sceintiﬁc) was sheared into

800 bp fragments with a Covaris S220 instrument (Covaris). The

library preparation was then performed according to the manufac-

turer’s instructions, whereby 96 unique ‘‘dual index’’ combina-

tions were used individually to label the library obtained from

each DNA sample, and the following modiﬁcations: (1) for clean-

ing and size selecting the samples after end repair, 70.2

mL sample

puriﬁcation beads plus 89.8

mL H

2

0 were added to a 100

mL sample,

and (2) in the ﬁnal PCR, the 72

C extension time was changed

from 30 to 90 s to account for the 800 bp fragment length.

Enrichment of HLA and KIR sequences was performed according

to a modiﬁed version of the Nextera Rapid Capture Exome enrich-

ment protocol (Illumina), a solution-based target-capture assay.

The libraries of genomic DNA, indexed uniquely for each sample

as described above, were pooled prior to their hybridization with

the capture probes. Thus, each hybridization mix (100

mL) con-

tained 96 uniquely indexed sequence libraries (62.5 ng for each li-

brary and 6,000 ng in total), 50 pM of each biotinylated capture

probe, and HB (CT3 and all subsequent buffers from Illumina).

The hybridization mix was incubated at 95

C for 10 min, gradu-

ally cooled by 2

C/min to 58

C, and then maintained at 58

C

for 90 min. In this reaction, fragments of genomic DNA that con-

tained targeted KIR and HLA sequences became speciﬁcally hy-

bridized to biotinylated capture probes.

In the next reaction, 100

mL of streptavidin-coated magnetic

beads were used to separate the speciﬁc hybridized genomic

DNA away from the non-speciﬁc un-hybridized genomic DNA.

The biotin present in hybrid DNA molecules was bound to strep-

tavidin on the beads, leaving the non-speciﬁc DNA in solution.

The DNA preparation enriched with the targeted KIR and HLA

genes was then eluted from the beads. Binding of the hybridiza-

tion product to the beads was achieved by 30 min incubation

with agitation at 1,000 rpm on a plate shaker at room temperature.

To clean the product, we removed the streptavidin beads from

solution by using a magnetic separator, mixed them with 200

Enrichment Wash Solution (Illumina), and incubated them at

C for 30 min. This wash step was repeated. To elute the

enriched DNA from the beads, we added 23

mL of elution mix

(made from 1.5

mL 2M NaOH plus 28 mL Elution Buffer 1, Illu-

mina), incubated for 5 min at room temperature and neutralized

with 4

mL Elute Target Buffer 2 (Illumina). The eluted material

was then subjected to a second round of enrichment from the

hybridization step onward. After the gradual cooling step, the hy-

bridization mix was maintained at 58

C for 14–18 hr.

An aliquot of 10

mL of the enriched DNA preparation was subject

to PCR ampliﬁcation in a 50

mL reaction mix containing 5 mL of a

PCR primer cocktail, 15

mL of resuspension buffer, and 20 mL of

Nextera Enrichment Ampliﬁcation Mix. PCR cycling was per-

formed as follows: 98

C for 30 s; 17 cycles of 98

C for 10 s, 60

C

for 30 s, and 72

C for 30 s; and a ﬁnal elongation step at 72

C for

5 min. Ampliﬁed material was puriﬁed with 40

mL of sample puriﬁ-

cation beads and eluted in 30

mL resuspension buffer (Illumina).

NGS Strategies

Set 1: IHWG Cell Lines. The enriched libraries were sequenced with

a HiSeq 2000 instrument and sequencing chemistry (Illumina).

Samples were clustered and paired-end sequencing was performed

with the TruSeq SBSv3-HS Kit (Illumina). The sequencing read

length was 2

3 101 bp.

Set 2: Trios and Chimpanzee. The enriched libraries obtained from

these samples were sequenced with a HiSeq 2500 instrument and

sequencing chemistry (Illumina). The sequencing read length was

3 250 bp. These samples were also genotyped for HLA-A, HLA-B,

and HLA-C with SSOPs and for KIR3DL1 and KIR3DL2 by pyrose-

quencing

(see

Appendix A

Set 3: European Control Samples. These samples were analyzed

with a MiSeq instrument (Illumina) with V3 chemistry, and the

sequencing read length was 2

3 300 bp.

Enrichment Efﬁciency

We estimated enrichment efﬁciency by mapping unprocessed

sequence reads to the human reference sequence (hg19) with

378

The American Journal of Human Genetics 99, 375–391, August 4, 2016

Yüklə 448,46 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 12