Defining kir and hla class I genotypes at Highest Resolution via High-Throughput Sequencing

Yüklə 448,46 Kb.

Pdf görüntüsü

səhifə	6/12
tarix	15.03.2018
ölçüsü	448,46 Kb.
	#31858

1 2 3 4 5 6 7 8 9 ... 12

Efﬁciency of the KIR Capture Method

To estimate the efﬁciency of the capture process, we

mapped all sequence reads generated for each individual

back to the human genome and counted those that

fell within the target coordinates. According to this

measure, the mean enrichment efﬁciency of the opti-

mized 2

3 300 bp sequencing runs (sample set 3) was

87.01% (SD

¼ 5.01). Because the target region repre-

sents

<0.01% of the human genome, compared to

whole-genome sequencing, this represents a signiﬁcant

(10,000

3) reduction in the sequencing capacity required

for analyzing the target.

Speciﬁcity of Sequence-Read Harvesting

To begin the bioinformatics analysis of KIR, we used a

panel of reference haplotype sequences as ﬁlters to har-

vest any sequence reads that could map to the KIR region

from the main pool of sequenced fragments (described in

the

Material and Methods

). Because both the capture

probes and the reference sequences for harvesting reads

were designed with complete KIR haplotypes that did

not have repetitive elements masked, we performed a

further test for speciﬁcity on the harvested KIR reads.

By analyzing 70,000 of these read pairs per individual,

we showed that a mean of three read pairs (modal

value

¼ 0) could map outside the target region of human

genome build hg19. Thus, the combination of our cap-

ture/NGS method and KIR sequence-read harvesting is

highly speciﬁc. We also note that generating 2

100 bp sequence reads (instead of 2

3 300 bp) revealed

that up to 12% of the harvested reads potentially origi-

55.22

55.26

55.30

55.34

55.38

3DL3 2DL3

2DL1

2DL4 3DL1 2DS4

3DL2

3DL3 2DS2

2DS3

2DL4 3DS1 2DL5 2DS5

2DL2 2DL5

2DS1

3DL2

A

B

Chromosome 19 coordinates (Mbp)

75

50

100

125

Coordinates of the reference haplotype (kbp)

80

120

0

A

B

KIR region sequencing gaps

start

size (bp)

feature

LocaƟon

1590

(T)n

55202

(TA)n

2DL3 intron 4

C

2kbp 5' 3DL3

Read

Depth

2DP1

3DP1

3DP1

Figure 3.

The

KIR Region Is >99.99%

Covered by Sequence Data

(A) Target KIR region on chromosome 19:

the gene locations are shown in orange,

and pseudogenes are shown in gray. The

KIR region varies in gene content, and

shown are examples of two frequent A

and B haplotypes. The KIR preﬁx is omitted

from the gene names for clarity (see

Appendix A

). The human reference build

hg19 (UCSC Genome Browser) has a KIR

A haplotype. Underneath is a KIR B haplo-

type shown to scale.

(B) Read depth after stringent alignment of

sequence reads (no base pairs mismatched,

and duplicates were removed) from the

PGF cell line to the PGF reference KIR hap-

lotypes 1 (light purple) and 2 (dark purple).

gaps in PGF KIR haplotype 2. The location

of the gaps is shown on the right.

nate from repetitive elements outside

the KIR region. However, 100% of

these reads map to a 1.8 kb LINE

insertion that is located in intron 6

of KIR3DL2

(

Figure S4

) and does

not overlap with any known control

elements. Thus, these reads do not affect the subsequent

analysis.

Measurement of KIR Gene Copy Number: PING_gc

The PING_gc component of PING speciﬁcally determines

gene copy number. KIR3DL3, a single-copy gene common

to all KIR haplotypes,

6,14,16

is used as the standard to which

other KIR genes are compared (

Material and Methods

). To

assess the correlation between read ratio and KIR gene con-

tent, we applied PING_gc to the sequence data generated

from the 97 IHWG samples. The ﬁrst PING_gc module

(called KFFgc) produced 13 distinct KIR gene presence or

absence genotypes (

Figure 4

A), identical to those obtained

by established methods.

26,59,60

We then applied the sec-

ond PING module, MIRAgc, and used the observed clus-

tering (

Figure 2

) to set threshold values for determining

KIR gene copy numbers (

Figure S3

). To validate these re-

sults, we studied DNA samples from 85 of the same cell

lines by using an established real-time PCR method for

quantifying KIR genes.

We observed 99.4% concordance

between the results obtained by PING_gc and those ob-

tained from real-time PCR (

Figure 4

B). Of ten discordant re-

sults from 1,700 determinations, four involved rare alleles

that were not detected by the primers of the real-time PCR

assay (KIR2DL2*009 and KIR3DL2*076, the latter of which

was discovered during this study). Of the other six discor-

dant results, two were due to false positives of the real-time

PCR (as shown independently by a standard PCR

method),

two were just below the threshold values of

the real-time PCR (but were clearly positive with PING_gc

and standard PCR), and two remain unexplained

(

Figure 4

B). Thus, the discrepancies were likely due to

The American Journal of Human Genetics 99, 375–391, August 4, 2016

381

Yüklə 448,46 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 12