Defining kir and hla class I genotypes at Highest Resolution via High-Throughput Sequencing

Yüklə 448,46 Kb.

Pdf görüntüsü

səhifə	5/12
tarix	15.03.2018
ölçüsü	448,46 Kb.
	#31858

1 2 3 4 5 6 7 8 9 ... 12

subset to the original allele alignment to produce a hit table of the

expected probe pattern for all possible genotypes (again with allge-

nos_hit_table_2.R). The number of exact matches to each probe

sequence or its reverse complement in the forward and reverse ﬁles

(read1.fastq and read2.fastq, respectively) for each sample was

then counted. Aggregate counts above a threshold of 10 were

considered positive, and the results were compared with the geno-

type hit table with a bespoke R script (KFFsums.R) for assigning the

KIR allele genotype.

KIR Sequence-Alignment Genotype by SOS. Data used for gener-

ating ﬁlters and reference sequences were obtained from the

IPD-KIR database (release 2.6.1, 17 February 2015) and the set of

29 complete KIR haplotype sequences.

6,14,15

For each given KIR

gene, all available allele sequences were selected for use as a posi-

tive ﬁlter, and all allele sequences of all other KIR genes were used

as a negative ﬁlter. We selected sequence reads speciﬁc to the given

KIR by mapping them to the positive ﬁlter with Bowtie 2.

performed this mapping non-stringently (

R97% nucleotides

matched) to allow for the detection of unknown SNPs. All reads

that aligned to the positive ﬁlter were retained, and those that

did not were excluded. The retained reads were mapped strin-

gently (

R99% match) to the negative ﬁlter, and in this case, those

that aligned were excluded. Finally, the selected reads were

aligned to a single reference sequence that was chosen for each

KIR gene (

Figure S3

), and the SNP variants were ascertained with

SAMtools/BCFtools version 1.2.

35,50

We analyzed the resulting variant call ﬁles (VCFs) with a custom

R script (jSOS) to generate genotypes based on the known combi-

nations of SNPs (i.e., the unique KIR alleles available from IPD). A

post-ﬁltered and aligned read depth of 20 was used as the mini-

mum for calling the genotype at any given nucleotide position.

The jSOS algorithm determines all of the possible allele combina-

tions on the basis of the genotypes of those SNPs that achieve

the threshold read depth. Thus, if a speciﬁc SNP fails to reach

the threshold, the ambiguity of the ﬁnal allele call will increase,

but an incorrect allele-level genotype will not be returned.

When the combination of SNPs does not correspond to a known

pair of alleles, jSOS identiﬁes that a novel combination of known

SNPs is present. In these situations, the known allele most likely

present is identiﬁed in addition to the new combination (we

view this as the most parsimonious genotype, and the least parsi-

monious could be two novel combinations present). The jSOS al-

gorithm also identiﬁes novel SNPs. When manual conﬁrmation

was sought, such as these cases of novel polymorphism, align-

ments for visual inspection were created with MIRA 4.0.2 and

examined with Gap4 of the Staden Package.

HLA Class I Genotypes

The HLA class I allele compositions were determined with

NGSengine 1.7.0 (GenDX software), kindly provided by Wietse

Mulder and Erik Rozemuller, with the ‘‘IMGT 3.18.0 combined’’

reference set. There was no pre-ﬁltering for HLA genes, and the

data were analyzed directly with the software. The one exception

was the removal of reads mapping to HLA-Y, an HLA class I pseu-

dogene present in a subset of HLA haplotypes.

The presence or

absence of HLA-Y was determined via sequence-speciﬁc string

searches of the FASTQ data. For the IHWG cells, these assignments

agreed with those previously determined by PCR.

We further

validated the assignment of HLA class I alleles by analyzing the

sequence data with Assign MPS 1.0 (Conexio Genomics), kindly

provided by Damian Goodridge. We resolved any discrepancies

between the two methods, or with previously obtained results,

by designing virtual probes that distinguish the two possibilities

in question and searching the unprocessed sequence-read data

(as described for KFF). Then, sequence reads speciﬁc to the locus

under question were extracted and aligned to reference sequences

(as described for SOS) and inspected manually. Here, the reference

sequences used were chosen on the basis of the HLA class I geno-

type of the respective sample. The majority of HLA class I alleles

are not fully characterized through all exons and introns

(

>95%).

Thus, for some of the validations, the second or third

ﬁelds of resolution were compared.

Haplotype Assignments

Haplotypes derived from homozygous cell lines were unambigu-

ous. Haplotypes were assigned for the family trios by segregation.

For all other individuals, centromeric and telomeric allele-level

KIR haplotypes were assigned for each individual according to

the expectation-maximization algorithm of haplo.stats imple-

mented in the R programming language (see

Web Resources

Results

Validation of the KIR Capture Method

We applied our capture/NGS method to DNA extracted

from 97 publically available cell lines (sample set 1,

Mate-

rial and Methods

), originally collected by the IHWG to

facilitate study of HLA genes.

One of these cell lines

is PGF, whose KIR haplotypes have been characterized

previously by standard methods.

6,14

We therefore used

the data we obtained from PGF cells to assess the quality

of the KIR sequences produced. The PGF KIR sequence

reads were mapped back onto the two conventionally

determined haplotype sequences. For PGF KIR haplotype

1 (137,813 bp; GenBank: FP089703), our method gave

100% coverage, a mean read depth of 49.4

3, and a read

depth of

>103 for 99.5% of the haplotype (

Figures 3

and 3B). For PGF KIR haplotype 2 (142,732 bp; GenBank:

FP089704), we obtained 99.99% coverage, a mean read

depth of 49.8

3, and a read depth >103 for 99.5% of the

haplotype. Our haplotype 2 sequence contains two short

gaps, comprising 18 nucleotides in total (

Figure 3

C).

The missing sequences are not predicted to be of func-

tional importance (

Figure 3

C). Thus, our method gives

full coverage of the KIR region when it targets haplotypes

that were included in the probe design. However, for pop-

ulation studies, it is important that our probe sets can cope

with a wider (and unknown) range of diversity without

any ‘‘allelic dropout.’’ As a test, we applied our method to

genomic DNA from the subject of the chimpanzee genome

project.

We mapped the obtained reads to the two

full KIR haplotype sequences (GenBank: BX842589 and

AC155174) that were previously characterized from this

individual.

25,58

These sequences represent both haplo-

types of the KIR region and include 10 of the 14 chim-

panzee KIR genes. We obtained 98.8% coverage for both

of the haplotypes and mean read depths of 130

3 for haplo-

type 1 and 110

3 for haplotype 2. This success in capturing

the chimpanzee KIR region strongly indicates that our

method captures the full range of human KIR haplotypes.

380

The American Journal of Human Genetics 99, 375–391, August 4, 2016

Yüklə 448,46 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 12