Defining kir and hla class I genotypes at Highest Resolution via High-Throughput Sequencing

Yüklə 448,46 Kb.

Pdf görüntüsü

səhifə	4/12
tarix	15.03.2018
ölçüsü	448,46 Kb.
	#31858

1 2 3 4 5 6 7 8 9 ... 12

the Burrows-Wheeler Aligner (BWA-MEM

; k

¼ 16) and counting

the number of reads that mapped within the target coordinates

of hg19.

Validating the Results from Capture/NGS via Established Methods

Previously described protocols were used for pyrosequencing

and real-time PCR.

Standard Sanger DNA sequencing reactions

were performed in forward and reverse directions with the BigDye

Terminator v.3.1 and were analyzed with a 3730 DNA Analyzer

(Applied Biosystems). To verify the sequences of previously

uncharacterized alleles, we cloned PCR products by using the

pCR2.1-TOPO vector (Invitrogen) and sequenced them with

M13 and internal primers. For each individual in whom an allele

was identiﬁed, ﬁve or more clones corresponding to the allele

were sequenced. HLA class I genotyping was performed with

Luminex-bead-based SSOP hybridization, as described previ-

ously

except where indicated otherwise.

Bioinformatics Methods: PING Pipeline

Harvesting KIR-Speciﬁc Reads

For sequence data obtained by the capture/NGS method, sequence

reads speciﬁc to the KIR region were identiﬁed and harvested with

Bowtie 2.

We concatenated the 29 sequenced KIR haplotypes

and all KIR alleles from the IPD-KIR database

6,14,15

to create a sin-

gle reference ﬁle for this purpose. The equivalent of 70,000 reads

(at 2

3 300 bp) that passed this ﬁlter stage were taken per sample

for KIR genotyping.

KIR-Gene-Content Module: PING_gc

This module is divided into two complementary components: the

ﬁrst (KFFgc) is based on string searches of raw data, and the second

(MIRAgc) is based on read depth following alignment to reference

sequences.

Determining KIR Gene Content with Virtual Probes: KFFgc. The

sequence-read ﬁles speciﬁc to the KIR region were ﬁrst processed

with the ‘‘FASTQ Quality Trimmer’’ function of the FASTX-Toolkit

v.0.0.13 (see

Web Resources

). From an alignment of all human KIR

sequences (IPD-KIR release 2.6.1, February 17, 2015),

all possible

32 bp sequences speciﬁc to just one of the KIR genes were gener-

ated, and those sequences covering polymorphic positions were

removed. For each KIR gene, this process produced a set of gene-

speciﬁc but not allele-speciﬁc probes. For each gene, ten such

probes were selected at random and used for determining KIR

gene content. We then counted the total number of exact matches

to each probe sequence and its reverse complement in the forward

and reverse (read1.fastq and read2.fastq, respectively) sequence-

read ﬁles for each sample. Because every KIR haplotype has one

copy of KIR3DL3,

6,14,16

the presence or absence of other KIR genes

was determined from the mean number of probe hits per target

KIR divided by the mean number of KIR3DL3 probe hits on

KIR3DL3. We used a ratio of 0.2 as the threshold, and values above

this were considered positive.

Determining KIR Gene Copy Number by Analyzing Read Depth:

MIRAgc. As shown previously, the depth of reads aligned to a refer-

ence can be used for estimating structural variation on a genomic

scale.

51,52

We incorporated this concept into the KIR genotyping

pipeline. Using MIRA 4.0.2,

we aligned the KIR-region-speciﬁc

sequence-read

pairs

simultaneously

against

one

reference

sequence for each KIR gene. Using values extracted from the ‘‘con-

tigstats’’ results table from the MIRA output, we divided the num-

ber of reads that aligned to each KIR gene by the number of reads

aligning to KIR3DL3. This comparison produced a clustering of

read ratios correlating with differences in KIR gene content

(

Figure 2

and

Figure S2

). For example, for KIR2DL4, which is ab-

sent from some haplotypes and duplicated on others,

13,53

we

observed three clusters of read ratios, corresponding to one, two,

or three copies of KIR2DL4 (

Figure 2

). For KIR2DL5, which can

be present at centromeric and telomeric locations in the KIR

haplotype,

6,54

we observed ﬁve clusters of read ratios, correspond-

ing to individuals with zero, one, two, three, or four copies of

KIR2DL5 (

Figure 2

). Representative examples of clustering results

derived for all the KIR are given in

Figure S2

In very rare cases, there can be copy-number variation of

KIR3DL3. For example, an individual with duplicated KIR3DL3

on one haplotype was observed in a study of

>3,500 subjects.

Such individuals would then show an unusual copy number for

all KIR genes and hence would be identiﬁed for manual inspec-

tion. The individual identiﬁed with a duplicated KIR3DL3 had a

normal genotype of two KIR3DL2 copies,

so in this case KIR3DL2

could be used as a substitute standard.

KIR-Allele-Genotyping Module: PING_allele

This module is also divided into two complementary components,

the ﬁrst (KFFallele) is based on string searches of raw data, and the

second (SOS) is based on read depth following alignment to refer-

ence sequences. A summary of the KFFallele workﬂow is shown in

Figure S1

A, and SOS is shown in

Figures S1

B and S1C.

High-Resolution KIR Genotyping with Virtual Probes: KFFallele. For

each KIR gene, an alignment of coding-region alleles was gener-

ated. From these, every possible unique 32-mer sequence that

did not overlap an exon boundary was generated, and the result-

ing pool was screened for the removal of all 32-mer sequences pre-

sent in another KIR gene. A ‘‘hit table’’ was generated with the

original allele alignment and bespoke R scripting (allgenos_hit_

table_2.R). The hit table was passed on to a random-forest algo-

rithm in the ‘‘RandomForest’’ package of the R statistical soft-

ware.

This algorithm ranks the probes according to the amount

of information they contribute to the ﬁnal answer. The purpose of

the random-forest step is to obtain the most efﬁcient subset of

probes, thereby increasing the computational speed of genotype

assignment. The output consists of a series of probe subsets that

have an increasing proportion of the total set (5%, 10%, 15%,

etc.). Empirical testing found that the most efﬁcient subset is usu-

ally the 40% most informative probes. We applied the ﬁnal probe

IHWG cell number

2

3

Gene

copy

number

Relative

read

count

0.1

0.5

0.9

1.3

100

KIR2DL5

0.1

0.5

0.9

1.3

100

KIR2DL4

IHWG cell number

0

1

Figure 2.

KIR Gene Copy-Number Geno-

type Determined by Read Depth

The ratio of reads mapping to a speciﬁc KIR

gene to those mapping to KIR3DL3 can be

used for calculating KIR copy number. The

results from 97 samples are shown and

sorted by ratio. KIR2DL4 (left) was present

in one, two, or three copies per individual

in the sample set, and KIR2DL5 (right)

was present in zero, one, two, three, or

four copies.

The American Journal of Human Genetics 99, 375–391, August 4, 2016

379

Yüklə 448,46 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 12