Defining kir and hla class I genotypes at Highest Resolution via High-Throughput Sequencing

Yüklə 448,46 Kb.

Pdf görüntüsü

səhifə	9/12
tarix	15.03.2018
ölçüsü	448,46 Kb.
	#31858

1 ... 4 5 6 7 8 9 10 11 12

HLA class I gene sequence in the modern human popula-

tion.

Further demonstrating our method’s robust capac-

ity to target divergent sequences, we successfully captured

and sequenced all alleles of Patr-A, Patr-B, and Patr-C

(

Table S4

E), the chimpanzee orthologs of HLA-A, HLA-B,

and HLA-C, respectively, from the chimpanzee that was

the subject of the chimpanzee genome project.

In sum-

mation, the breadth and depth of our results give conﬁ-

dence that our method will be able to capture the full range

of HLA class I alleles.

Discussion

We developed an integrated capture/NGS method to

characterize completely the structure and sequence of

the highly polymorphic KIR and HLA class I genes. The

approach enables a focused and extensive deﬁnition of

this physiologically important variation, which is not

possible with any other single method. Our method is

also well suited for genotyping the large cohorts required

for insightful study of population genetics and disease as-

sociation, as well as donor selection for clinical transplan-

tation. All components of the method were validated with

panels of DNA samples that represent the observed range

of human variation for these complex genomic regions.

Both the method and the information we have obtained

during its development should prove valuable resources

for future studies.

We used DNA from well-characterized immortalized hu-

man B cell lines as reference materials during the design

and optimization of the laboratory and bioinformatics

methods. These panels of IHWG and 1000 Genomes cell

lines are generally available for other researchers (see

Web Resources

). For validation, we focused on lineage II

KIR genes because they exhibit some of the most extreme

and complex genomic variation within the human KIR lo-

cus. We identiﬁed and distinguished deletions and duplica-

tions of KIR3DL1/S1 and the presence of the KIR3DL1/2v

fusion gene and also deﬁned the alleles of these genes.

This was achieved by a combination of quantitative assess-

ment of read depth, virtual sequence probing, and refer-

ence alignment. Such independent veriﬁcation is critical

for characterizing structural KIR variants, which are not de-

tected by methods that depend only on the alignment of

sequence reads to reference haplotypes.

65,66

In summary,

the validation experiments demonstrate our method to

be robust and capable of detecting the full range of KIR

genomic variation.

200

400

600

1000

2000

3000

100

200

300

1000

2000

3000

A*24:02:01:01

B*13:02:01

B*48:01:01

C*06:02:01:01

C*08:01:01

reference sequence coordinates (bp)

Read

depth

Read

depth

Read

depth

B

A

100

200

300

1000

2000

3000

HLA-A

HLA-B

HLA-C

A*01

B*07

B*48

C*01

A*02

B*08

B*49

C*02

A*03

B*13

B*50

C*03

A*11

B*14

B*51

C*04

A*23

B*15

B*52

C*05

A*24

B*18

B*53

C*06

A*25

B*27

B*54

C*07

A*26

B*35

B*55

C*08

A*29

B*37

B*56

C*12

A*30

B*38

B*57

C*14

A*31

B*39

B*58

C*15

A*32

B*40

B*78

C*16

A*33

B*41

C*17

A*34

B*42

C*18

A*36

B*44

A*66

B*45

A*68

B*46

A*74

B*47

Figure 7.

Capture of HLA Class I Genes for High-Resolution

Allele Genotyping

(A) Shown is the read depth across each of the HLA class I genes

from a representative sample (chosen by virtue of having a read

number closest to the mean number of HLA-speciﬁc reads). Green

lines indicate the coordinates of the exons that were covered. To

generate this ﬁgure, we obtained full gene sequences (~3 kb

each) from IPD to represent all ﬁve HLA class I alleles known

to be present in this sample (the sample is homozygous for a

common allele of HLA-A). Sequence reads were ﬁltered to be spe-

ciﬁc to HLA-A, HLA-B, and HLA-C and then aligned to these refer-

ences with high stringency. The read depth was measured with

SAMtools/BCFtools.

(B) The major HLA class I allele types detected in this study.

The American Journal of Human Genetics 99, 375–391, August 4, 2016

385

With few exceptions, studies of KIR in human popula-

tions and disease cohorts have analyzed KIR gene content,

but not allelic diversity.

67,68

Such studies were seminal for

showing how KIR genomic diversity can shape the im-

mune response and provide resistance to disease.

Studies

of gene content also uncovered the inﬂuence of KIR diver-

sity on the success of reproduction and bone marrow

transplantation.

9,11

The few studies that have focused on

speciﬁc KIR genes and their allelic diversity and copy

number have reﬁned these disease associations and impli-

cated speciﬁc alleles.

61,70

In the course of validating our

method, we identiﬁed and characterized 116 novel KIR

alleles. This knowledge of KIR polymorphism makes sub-

stantial contributions to the KIR database.

For example,

the number of KIR3DL2 alleles was doubled and is now

in excess of 100. We also show that 476 (22.5%) of the

1000 Genomes individuals have at least one example of a

structural variant or novel allele of KIR3DL1/S1 or KIR3DL2

(

Table S1

C). All of these genomic variations have potential

to inﬂuence NK cell function, but they are not visible to

typing at the level of KIR gene content. A strong case can

therefore be made that high-resolution knowledge of KIR

diversity, in all its forms, will identify additional disease as-

sociations and improve the understanding of those already

known.

The study of human populations and their evolutionary

dynamics, ancestry, and disease has beneﬁted from GWAS

methods, which genotype numerous SNP markers in large

cohorts of individuals. Such analysis of the KIR region has

been impractical because its extraordinary structural diver-

sity leaves few locations suitable for designing binary

SNP markers, and many of the KIR genotyping results fail

routine quality-control ﬁlters. Thus, the ‘‘immunochip,’’

which focuses on immune-system genes

and has reﬁned

the role of HLA-associated diseases, includes relatively few

informative SNPs in the KIR region. These SNPs are located

in KIR3DL3, KIR2DL4, KIR3DL1/S1, and KIR3DL2, which

were previously assumed to be present in one copy on

every haplotype.

Our study demonstrates that this is

not the reality. In more than 10% of 1000 Genomes indi-

viduals, one of these four KIR genes is deleted, duplicated,

or part of a fusion gene. We conclude that genotyping SNPs

within the KIR locus by using standard binary measure-

ment is of little practical value.

To compensate for the absence of suitable SNPs within

the KIR genes, a recently described imputation method

should accurately re-assess the diversity of KIR gene con-

tent for many of the reported GWASs.

Imputation of

HLA class I alleles from GWAS data has been informative

for studies of immune-mediated diseases.

Imputation of

HLA alleles varies in its efﬁciency, particularly in non-Euro-

pean individuals, partly because

>10,000 alleles are

described but also because imputation relies on linkage

disequilibrium, which can extend for shorter genomic

tracts in non-Europeans than in Europeans.

15,72

Many

of the KIR variants and polymorphisms identiﬁed by

our method are not evenly distributed across human

populations. For example, the KIR3DL1/2v fusion gene

is restricted to Africans, who exhibit the lowest linkage

disequilibrium worldwide. Thus, it is unlikely that imputa-

tion will be able to resolve all of the structural and allelic

diversity of KIR.

We employed short-read technology because of its high

ﬁdelity. Pressing this point, all of the novel SNPs identiﬁed

by our method were conﬁrmed by independent and well-

established sequencing methods. The capture method we

used will probably soon be adapted to obtain longer frag-

ment sizes and read lengths, which should become increas-

ingly valuable as the sequencing error rates decrease.

Because we are able to capture and sequence the chim-

panzee KIR region, our method most likely captures the

extent of human KIR diversity. Thus, there is limited

allelic dropout. Alternative methods that do not suffer

allelic dropout are whole-genome approaches. Because

our method targets large numbers of individuals and has

a low impact on sequencing instruments and reagent re-

sources, our assay provides an economic and practically

viable alternative to whole-genome experiments. We also

note that our bioinformatics pipeline can obtain accurate

KIR genotypes from any whole-genome sequencing exper-

iments of sufﬁcient mean depth. The pipeline is also de-

signed for application to any highly polymorphic gene sys-

tem. Our approach is designed to genotype very large

numbers of individuals while having a low impact on com-

puter resources. In these properties, it differs from the pop-

ulation-graph method of allele designation that has been

applied to HLA.

However, this method could be a valu-

able complement to our methods if it is also applied to KIR.

The ﬁrst KIR cDNA sequences were reported in the late

1990s.

75,76

This led to research that revealed the unantici-

pated scope of genetic complexity and diversity of the

human KIR gene family.

The method we describe here

will facilitate determination of a complete description of

KIR variation in the human population, its interaction

and co-evolution with HLA class I, and its inﬂuence on

physiology, disease, and immunotherapy.

Appendix A: KIR Nomenclature

Throughout this paper, any unique DNA sequence that

spans a coding region (otherwise known as a CDS) is

considered a distinct allele. Those alleles that encode a

unique protein sequence deﬁne an allotype. KIR genes

and alleles are named by the KIR Nomenclature Commit-

tee, formed from members of the WHO Nomenclature

Committee for factors of the HLA system and members

of the HUGO Genome Nomenclature Committee.

The

IPD-KIR database is part of the IPD, which is listed in the

Web Resources

section.

In the nomenclature for alleles, the digit (2 or 3) and let-

ter (D) following the KIR preﬁx indicate whether two (2D)

or three (3D) immunoglobulin (Ig)-like domains are pre-

sent in the encoded protein. After the letter D is another

386

The American Journal of Human Genetics 99, 375–391, August 4, 2016

Yüklə 448,46 Kb.

Dostları ilə paylaş:

1 ... 4 5 6 7 8 9 10 11 12