TBC-1: Recently exonized Alu
elements in Macaca fascicularis
Young-Hyun
Kim1,2, Jae-Won Huh1,2 and Kyu-Tae Chang1,2
1National
Primate Research Center, Korea Research Institute of Bioscience and
Biotechnology, Ochang 363-883, Republic of Korea
2University
of Science & Technology, National Primate Research Center, KRIBB,
Ochang 363-883, Republic of Korea
Crab-eating
monkey (Macaca fascicularis) and
rhesus monkey (Macaca mullata) are
frequently used and valuable primate model species. Although they most common
primate model organism for biomedical approaches, their genetic information is
not yet applicable except for rhesus monkey. In this study, we tried to analyze
genomic diversity of closely related two macaca species with recently
integrated Alu elements. First, the Macaca
fascicularis mRNA sequences (10221 mRNA) were collected from Genebank
database, and ¡®young¡¯ Alu-exonized
mRNA sequences were sorted by repeatmasker program (216 mRNA). Second, for
avoiding the false positive data (avoiding the genomic contaminated cDNA
sequences), manual correction were conducted. Third, ten genes were chosen, and
eight genes contained young Alu
element were identified. Finally, for the verification of exonized young Alu element, PCR amplification and
sequencing procedure were conducted using various human and primate DNA
samples. Intriguingly, two genes (C9orf6 and NOLC1 gene) harbor the insertional
polymorphic Alu element in their transcript. Although, we did not use the whole
genome information of Macaca fascicularis,
genome wide survey could be a useful tool for understanding the useful primate
model organism.
TBC-2: Genome diversification mechanism between human and chimpanzee
Jae-Won
Huh 1,2 and Kyu-Tae Chang1,2
1National
Primate Research Center, Korea Research Institute of Bioscience and
Biotechnology, Ochang 363-883, Republic of Korea
2University of Science &
Technology, National Primate Research Center, KRIBB, Ochang 363-883, Republic
of Korea
Chimpanzee
is the most closely related living species of human. Human and chimpanzee
genome project show that there is only about 1 % genome difference between the
two species. Thus, the comparison of gene sequences of two species could show
us the genetic components that are related with lineage specific events. We
compared and investigated the gene regions between human and chimpanzee using
bioinformatic and experimental tools. In
silico comparison was performed between human and chimpanzee genome. Among the
65248 insertion-deletion (INDEL) loci, 285 genes regions were identified, and
130 gene regions were experimentally validated. Although, 48 gene loci did not
show any genetic differences, 32 gene loci showed the lineage specific INDEL
events (insertion in human - 12 genes, deletion in chimpanzee – 20 genes).
Those INDEL events categorized into five different evolutionary mechanism
including retroelements-related (12 genes), homologous recombination and
excision (12 genes), tandem repeats variation (5 genes), gene conversion (2
genes), and processed pseudogene formation (1 gene) mechanism. These results
suggest that not only simple integration events can drive the genetic
differences, but deletion mediated by the recombination event also participate
the lineage specific evolutionary events between human and chimpanzee lineage.
TBC-3: Transcriptome sequencing and gene analyses in the crab-eating macaque
Kyu-Tae
Chnag1,2
1National
Primate Research Center, Korea Research Institute of Bioscience and
Biotechnology, Ochang 363-883, Republic of Korea
2University
of Science & Technology, National Primate Research Center, KRIBB, Ochang
363-883, Republic of Korea
As a
human mimic, the crab-eating macaque (Macaca
fascicularis) is an invaluable non-human primate model for biomedical
research, but the lack of genetic information on this primate has represented a
significant obstacle for its broader use. Here, we sequenced the transcriptome
of 16 tissues and identified genes to resolve the main obstacles for
understanding the biological response of the crab-eating macaque. From 4
million reads with 1.4 billion base sequences, 31,786 isotigs containing genes
similar to those of humans, 12,672 novel isotigs, and 348,160 singletons were
identified using the GS FLX sequencing method. Approximately 86% of human genes
were represented among the genes sequenced in this study. Additionally, 175
tissue-specific genes were identified, 81 of which were experimentally
validated. In total, 4,314 alternative splicing (AS) events were identified and
analyzed. Intriguingly, 10.4% of AS events were associated with transposable
element (TE) insertions. Finally, investigation of TE exonization events and evolutionary
analysis were conducted, revealing interesting phenomena of human-specific
amplified trends in TE exonization events. This report represents the first
large-scale transcriptome sequencing and genetic analyses of M. fascicularis and could contribute to
its utility for biomedical research and basic biology.
TBC-4: Recent Positive Selection in Human Genes That are Enriched for Disease
Mutations, but Limited for Polymorphism
Yoon-Ho
Hong1, Malcolm Campbell2, Kyungjoon Lee2,
In-Hee Lee2 and Sek-Won Kong2
1Department
of Neurology, Seoul National University Boramae Municipal Hospital, Korea
2Children's
Hospital Informatics Program, Boston Children's Hospital, USA
Examining
the near-full spectrum of genetic variation across the whole human genome is now
possible with the advances of high-throughput sequencing technology. This
enables population scale analysis of sequence variations, which provides an
opportunity to explore characteristics of human disease genes and mutations in
the context of molecular evolution. Here, using the whole genome sequence data
of 37 putatively healthy unrelated individuals, we investigated the effects of
natural selection in shaping the frequency spectrum of genetic polymorphism and
disease mutations. We found that a quantitative estimate of evolutionary
constraints is significantly higher in genes with lower frequency of
polymorphic coding variants. The correlation between polymorphism and natural
selection is also supported by 1) population and comparative analyses at the gene
level, which revealed a significantly greater spectrum of single nucleotide
polymorphisms (SNPs) in genes under positive selection, and 2) analysis in the
context of human disease and gene essentiality, which confirmed the limited
spectrum of polymorphism in disease genes with greater essentiality.
Interestingly, the signature of recent or ongoing positive selection was
consistently found in a subset of disease genes that are limited for
polymorphism but enriched for disease-linked mutations. This suggests that
recent adaptive selection might have acted on evolutionarily conserved genes,
increasing the spectrum of disease-linked mutations.
TBC-5: Gene expression changes as resistant markers to cisplatin in a panel of
bladder cancer cell lines
Sung
Han Kim1 and Seok Soo Byun2
1Seoul
National University Hospital, Republic of Korea
2Seoul
National University Bundang Hospital, Republic of Korea
BACKGROUND:
Cisplatin, one of the most effective anticancer drugs for bladder cancer,
develops resistance during treatment by a cellular self-defense system of
activating or silencing a variety of different genes, resulting in genetic and
epigenetic alternations. As a result, the resistance mechanism of cisplatin is
one of the most investigated subjects in clinical fields. In order to
understand the resistance mechanism and to establish a possible gene candidate,
a panel of cisplatin-resistant and general bladder cancer cell lines were used
in a combination of microarray and real time-PCR profiling to investigate the
possible resistant cisplatin gene expression.
METHOD:
The human bladder cancer cell line (T24) obtained from the American Type
Culture Collection (ATCC) and the preformed bladder cancer resistant cell line
at 2.0¥ìg/ml of cisplatin (T24R2) were used for the microarray analysis to
define the different expressions of significant genes resistant to cisplatin.
Those upregulated significant genes were compared to tissue assay of bladder
cancer resistant to cisplatin chemotherapy by real time PCR using. A fold
change¡Ã 2 with p-value< 0.05 of statistics was considered significant.
RESULTS:
Among a list of 488 up-regulated genes and 69 pathways from microarray
analysis, a panel of 23 genes was selected for real time-PCR validation from
four selected cancer-related pathways (p53, apoptosis, cell cycle, and pathway
in cancer). All 23 genes were determined to be significantly different and
up-regulated in both the microarray and the RT-PCR with fold change >2.0.
They are PRKAR2A and 2B, CYCS, Bcl-2, BIRC3, DFFB, CASP6, CDK6, CCNE1, CUL2,
FN1, STEAP3, MCM7, ORC2 and 5, LEF1, ANAPC1 and 7, CDC7 and 27, SKP1, WNT5a and
5b genes. Especially, the fold changes of CUL2, MCM7, WNT5A and 5B, LEF1,
Bcl-2, CYCS, and PRKAR2B were greater than 4.0, suggesting high correlation
with cisplatin resistance.
CONCLUSIONS:
A panel of 23 up-regulated genes including the 5 genes with greater fold
changes was determined to be significantly different from cisplatin resistant
bladder cancer and bladder cancer cell lines. We propose that their gene
expression profiles may play one of the key roles in the resistance mechanism
to cisplatin in patients with bladder cancer.
TBC-6:
GlaI-qPCR assay — a new instrument for
quantitative DNA methylation analysis and its application for tumor suppressor
genes study
Vitaliy
Kuznetsov1, Elena Zemlyanskaya1 and Sergey Degtyarev1
1SibEnzyme
Ltd., Novosibirsk, Russia, 630117
De novo
DNA methylation in mammals is performed by Dnmt3a and Dnmt3b DNA
methyltransferases, which recognize a tetranucleotide 5¡¯-RCGY-3¡¯ and modify the
inner CG-dinucleotide with formation of 5¡¯-R(5mC)GY-3¡¯/3¡¯-YG(5mC)R-5¡¯[1].
GlaI is
a novel methyl-directed site-specific DNA-endonuclease which recognizes DNA
sequence 5¡¯-R(5mC)¡éGY-3¡¯ and cleaves it as indicated by arrow [2]. Thus, the
recognition sequence of GlaI exactly corresponds to a product of DNA
methylation with Dnmt3a and Dnmt3b. GlaI cleaves DNA completely and requires no
additional cofactors [3]. Recently we have developed GlaI-PCR assay which
allows determination of 5¡¯-R(5mC)GY-3¡¯ sites in studied DNA region [4]. The
method includes DNA hydrolysis with GlaI followed by PCR with primers designed
for the DNA region of interest. Earlier we have used GlaI-PCR assay to
determine DNA methylation status of regulatory regions of tumor suppressor
genes (TSGs) [5]. In this work we perform real time GlaI-PCR assay for
quantitative determination (GlaI-qPCR) of 5¡¯-R(5mC)GY-3¡¯ sites in studied DNA
regions. This assay was applied for study of DNA methylation in regulatory
regions of RARB, NOTCH1, DAPK1, SEPT9b, IGFBP3, CEBPD, MGMT and RASSF1A TSGs in
malignant cell lines HeLa, Raji, U-937, Jurkat and in the control fibroblast
cell line L-68. We received methylation profiles of these genes for each cell
line. In correspondence with previous data regulatory regions of TSGs are
methylated in malignant cell lines. However, the methylation profiles are
different for each cell line. This allows differentiating between different
types of cancer cells. The results show that method of GlaI-qPCR assay may be
used for quantitative determination of de
novo DNA methylation.
References
1.
Handa V, and Jeltsch A. J. Mol. Biol. 2005; 348, 1103-1112.
2.
Tarasova GV et al. BMC Mol. Biol. 2008; 9, 7.
3.
Abdurashitov MA et al. BMC Genomics, 2009; 10, 322.
4. SE
Scientific Library [http://science.sibenzyme.com/article12_article_53_1.phtml]
5. SE
Scientific Library [http://science.sibenzyme.com/article8_article_58_1.phtml]
TBC-7:
A Filtering Algorithm for Gene-Gene Interaction using Case-Only Data
Pin-Cian
Wang1, Liang-Chuan Lai2, Mong-Hsun Tsai3, Eric
Y. Chuang4, Cheng-Yan Kao1 and Pei-Chun Chen5
1Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan
University, Taiwan
2Graduate
Institute of Physiology, National Taiwan University, Taiwan
3Graduate
Institute of Biotechnology, National Taiwan University, Taiwan
4Bioinformatics
and Biostatistics Core, Research Center for Medical Excellence, National Taiwan
University, Taiwan
5Department
of Statistics and Informatics Science, Providence University, Taiwan
Genome-wide
association studies (GWAS) are typical study designs in genetic epidemiology
using whole-genome SNP data. Single-locus test is used in most GWAS. However,
some researchers have indicated the problems of GWAS using single-locus
strategy. Gene-gene interaction becomes a more important issue. Exhaustive search
methods such as multifactor dimensionality reduction (MDR) are powerful tools
for gene-gene interaction detection. However, the main limitation of MDR is
heavy computation. Therefore, the aim of our research was to design a filtering
algorithm to select a candidate SNP set for further analysis and that can save
computation time and get same prediction, called the deviance of independence
(DOI).
DOI
describes the level of dependence between two SNPs. In the first step of DOI
calculation, the SNP data in control samples was removed because it was
hypothesized that the frequency of allele and genotype may be stable in normal
population. Next, the frequency of expected two-SNP combination and real
two-SNP combination were calculated. The frequency of expected two-SNP
combination was derived from the frequency of two individual SNPs according the
principle of independence. Finally, DOI values were calculated by the summation
of each absolute difference between the frequency of expected and real two-SNP
combination. It is expected that the SNP combinations with high DOI have more
potential to be the interaction combinations.
We use
simulation and real data to examine DOI performance. The simulation results
show that DOI values may be used to predict the interaction combinations. In
addition, the WTCCC Rheumatoid arthritis (RA) chromosome 22 data and
Parkinson's disease (PD) chromosome 20 data were used for real data
application. And the results demonstrate that potential interactions can be
identified after using DOI value as a filter criterion. In sum, DOI algorithm
is a powerful tool to filter a candidate gene set for further interaction
analysis.
TBC-8:
20-gene-based risk score classifier predicts disease recurrence in non-muscle
invasive bladder cancer
Seon-Kyu
Kim1, Young-Kyu Park1 and Seon-Young Kim1
1Medical
Genomics Research Center, Korea Research Institute of Bioscience and Biotechnology,
Daejeon, Korea
Background
Bladder
cancer is a genetic disorder driven by the progressive accumulation of multiple
genetic changes. While several molecular markers for the recurrence of bladder
cancer have been studied, the limited value of current prognostic markers has
created the need for new molecular indicators of bladder cancer outcomes. Here,
we sought to identify a molecular signature associated with disease recurrence
in non-muscle invasive bladder cancer (NMIBC) and to assess its usefulness as a
prognostic indicator.
Methods
Microarray
gene expression profiling was performed using gene-expression data from 102
primary NMIBC specimens (Korean cohort) to identify a gene expression signature
associated with disease recurrence. The prognostic value of the gene expression
signature was validated in an independent cohort (European cohort, n=302). A
risk score based on the expression data of 20 genes was developed in the Korean
cohort and validated in the European cohort. The association between the
20-gene-based risk scoring method and prognosis of NMIBC patients was assessed
using Kaplan- Meier plot, the log-rank test, Cox proportional hazards model,
and leave-one-out cross validation method.
Results
The
determination of gene expression patterns by microarray data analysis identified
822 genes associated with disease recurrence. Of the 822 genes, 20 genes which
are highly associated with recurrence free survival were detected by
time-dependent ROC analysis. The risk score was developed by using Cox
coefficient values of 20 genes in the Korean cohort and its robustness was
validated in the European cohort (log-rank test, P < 0.001). Multivariate Cox regression analysis revealed that
the risk score was an independent strong predictor of disease recurrence
(hazard ratio = 6.082, 95% confidence interval = 3.280 to 11.279, P < 0.001).
Conclusions
The
risk scoring method based on 20 genes represents a promising diagnostic tool to
identify NMIBC patients that have a high risk of recurrence.
TBC-9:
Genome-wide analysis of CNV and SNP in Koreans
Sanghoon
Moon1, Kwang Su Jung2, Young Jin Kim1, Miyeong
Hwang1, Kyungsook Han4, Bok-Ghee Han3,
Jong-Young Lee1, Kiejung Park2 and Bong-Jo Kim1
1Division
of Structural and Functional Genomics, Center for Genome Science, National Institute
of Health, Chungcheongbuk-do, 363-951, Korea
2Division of Bio-Medical informatics,
Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 363-951,
Korea
3Center for Genome Science,
National Institute of Health, Chungcheongbuk-do, 363-951, Korea
4School of Computer
Science and Engineering, Inha University, Inchon, 402-751, Korea
To
date, single-marker association analysis in genome-wide association studies
(GWAS) has identified a large number of single nucleotide polymorphisms (SNPs)
that are highly associated with complex diseases, but only a small portion of
genetic heritability is explained by these variants. A copy number variation
(CNV) is a physical change of genomic segment ranging from a kilobase to
several megabases. CNV may alter disease susceptibility and gene dosage for
genetic risk, so is a useful source for finding missing heritability.
Recent
studies have reported that 60% of the detected CNVs were called with a single
copy-number class, which cannot be tested for association and that well-defined
polymorphic CNVs tagged by SNPs are more likely to affect multiple expression
traits than frequency-matched variants. CNVs encompassing single genes or a set
of genes can be more causative variants of genetic disease than SNPs alone.
Therefore, SNPs correlated with CNVs are a valuable resource for GWAS.
Most
CNV databases (except SCAN) do not consider polymorphic CNV (multi copy-number
class). SCAN database also contains CNV data of Caucasian and Yoruba
populations, and does not provide Asian CNV data. Due to the difference in CNVs
between distinct ethnic groups, providing polymorphic CNVs and allele frequency
of each genotype in Asian populations will help investigate CNV-association
with diseases and ethnic differences.
In this
study we developed a database called Korean Genomic Variant Database (KGVDB),
which provides polymorphic CNV regions and well-tagged SNP information. The
data were obtained from 4,700 individuals using two different genotyping
platforms and publicly available CNV data. The large data set of KGVDB will provide
a rich public resource for the study of CNV and SNP.
TBC-10:
3D-QSAR Pharmacophore Modeling of Thromboxane A2 Receptor for Discovery New
Inhibitors
Kuei-Chung
Shih1, Cheng-Yu Ma1, Hsiao-Chieh Chi1 and Chuan-Yi Tang1,2
1Department
of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 30013, R.O.C.
2Department of Computer
Science and Information Engineering, Providence University, Taichung, Taiwan 43301,
R.O.C.
Thromboxane A2 (TXA2 ) is a hormone derived from arachidonic acid (AA) through cyclooxygenases (COX) and thromboxane synthase (TXS), and it is a platelet aggregator by activating thromboxane A2 receptor (TP) to induce platelet aggregation and cell proliferation. Based on the action of platelet activation, TXA2 is associated with thrombosis, acute myocardial infarction and many diverse inflammatory diseases. There are some different
approaches to achieve antiplatelet therapy through this prostanoid pathway. One strategy is to inhibit COX so that TXA2 could not be produce from AA, such as the most well-known antiplatelet drug, aspirin. Despite aspirin could resist myocardial infarction and stroke, it may lead to gastrointestinal disorder and allergy. TXS inhibition is one kind of inhibitors for suppressing TXS to generate TXA2, but it does not work efficiently because other endoperoxides and isoprostanes can also active TP just like TXA
2. Accordingly, the method to directly inhibit TP seems to be attractive. However, TP antagonists include ifetroban, sulotroban, GR32191 and other antithrombotic agents still stay in phase II or III of clinical development due to the safety concerns and efficacy. The previous studies were not proposed available co-complex structure between TP and Thromboxane A2 (TXA2) or any of its inhibitors, it is necessary to establish a screening model for rational drug design in silico. Our research is focus on building
the TP phaemacophore hypothesis for discovering other potential TP inhibitors. This study report, we developed pharmacophore hypothesis for discovery new TP inhibitors. The best hypothesis has one hydrogen-bond acceptor (A) and three hydrophobic aromatic groups (HYAR), its correlation coefficient of training set and testing set were 0.933 and 0.923, respectively. According to statistical validation and chemical features analysis, our best pharmacophore hypothesis has excellent ability to help medicinal chemists
in their efforts to identify or design new TP inhibitors.
TBC-11:
Comparison of somatic mutation-calling methods based on DNA sequence from
matched tumor-normal pairs
Su Yeon
Kim1 and Terry Speed1,2
1University
of California at Berkeley, Berkeley 94720, USA
2Walter and Eliza Hall
Institute of Medical Research, Parkville Victoria 3052, Australia
Somatic
mutation-calling based on DNA from matched tumor-normal patient samples is one
of the key tasks carried by many cancer genome projects. In particular, The
Cancer Genome Atlas (TCGA) is now routinely compiling catalogs of somatic
mutations for hundreds of patients for various tumor types. Nonetheless,
mutation calling is still a very challenging problem. TCGA benchmark studies
reveal that even up-to-date mutation callers from major sequencing centers show
substantial discrepancies. For most tumor types, validation data is not yet
available, and even when it will be, only a fraction of all candidate mutations
are likely to be validated. In order to compare mutation callers without
genome-wide gold standard validation data, we have developed an approach using
pseudo-positives (presumed somatic mutations) and pseudo-negatives (presumed
not somatic mutations) that are defined using another caller. The other callers
can be built on using publicly available variant calling methods such as GATK
or SAMtools. This approach allows us to give a convenient visualization of the
discrepancies between the different mutation call sets, and to summarize each
mutation-caller's performance in terms of pseudo-false-positive and
pseudo-false-negative rates. Some insights were gained from observing
consistent results from two other callers that are not expected to introduce
the same biases.
TBC-12:
The estimation of heritability analyses for BMI using genotype score based on Korean
Cohort
Nam Hee
Kim1, Youngdoe Kim1, Young Jin Kim1, Ji Hee Oh1,
Mee Hee Lee1 and Juyoung Lee1
Division of Structural and
Functional Genomics, Center for Genome Science, National Institute of Health,
Korea Centers for Disease Control and Prevention, Korea
The aim
of study was to estimate variation and their heritability for BMI including
genotype score and compare BMI to other cohort. We have constructed community
and twin-family based on cohort, which is an ongoing prospective studies and
surveyed samples were drawn from the Korean Genome and Epidemiology Study and
Korea Genome Analysis Project in Korea.
We
selected 2,473 subjects in twin-family cohort and surveyed their zygosity using
the self-report questionnaires about 2,000 items and genotyped using Affy 6.0.
From community-based cohort(KARE; Korea Association REsource), we selected
8,842 subjects and surveyed their self-report questionnaires about 1,400 items
and genotyped using Affy 5.0. Including genotype score of BMI estimated
heritability for BMI using SOLAR, GCTA, GENABEL.
TBC-13:
Genotype instability during long-term subculture of lymphoblastoid cell lines
Ji Hee
Oh1, Young Jin Kim1, Sanghoon Moon1, Jong-Young
Lee1 and Yoon Shin Cho1,2
1Division
of Structural and Functional Genomics, Center for Genome Science, National Institute
of Health, Chungcheongbuk-do 363-951, Republic of Korea
2Department of Biomedical
Science, Hallym University, 1 Hallymdaehak-gil, Chuncheon, Gangwon-do 200-
702, Republic of Korea
Epstein-Barr
virus (EBV-transformed lymphoblastoid cell lines (LCLs) promise to address the
challenge posed by the limited availability of primary cells needed as a source
of genomic DNA for genetic studies. However, the genetic stability of LCLs
following prolonged culture has never been rigorously investigated. To evaluate
genotypic errors caused by EBV integration into human chromosomes, we isolated
genomic DNA from human peripheral blood mononuclear cells and LCLs collected
from 20 individuals and genotyped the DNA samples using the Affymetrix 500K SNP
array set. Genotype concordance measurements between two sources of DNA from
the same individual indicated that genotypic discordance is negligible in
early-passage LCLs (less than 41 passages) but substantial in late-passage LCLs
(more than 40 passages). Analysis of concordance on a chromosome-by-chromosome
basis identified genomic regions with a high frequency of genotypic errors
resulting from the loss of heterozygosity observed in late-passage LCLs. Our
findings suggest that, whereas LCLs harvested during early stages of
propagation are a reliable source of genomic DNA for genetic studies,
investigations that involve genotyping of the entire genome should not use DNA
from late-passage LCLs.
TBC-14:
Multi-study integration of brain cancer transcriptomes reveals organ-level
diagnostic signatures
Jaeyun
Sung1, Pan-Jun Kim1, Leroy Hood2, Donald Geman3
and Nathan Price2
1Asia
Pacific Center for Theoretical Physics, Korea
2United
States Institute for Systems Biology, USA
3Institute
for Computational Medicine, Department of Applied Mathematics and Statistics,
Johns Hopkins University, USA
The
identification of molecular signatures from either tissues or blood to
accurately reflect the major cancers of an organ system would be a significant
advance in molecular cancer diagnostics. Towards this goal, we identified
comprehensive diagnostic signatures of major cancers of the human brain from a
multi-study, integrated transcriptomic dataset. These signatures are based on
comparing ranked expression values of gene-pair sets, which are aggregated into
a brain cancer marker-panel of 44 unique genes. Many of these genes have
established relevance to the brain cancers tested herein, with others having
known roles in cancer biology. Phenotype prediction follows a diagnostic
hierarchy, and the corresponding hierarchically-structured signatures achieved
90% classification accuracy against a multi-disease alternative hypothesis when
training and validation sets were drawn from the same population distribution
(cross validation). Despite accurately distinguishing among phenotypes in
single-population cross-validation, diagnostic signatures must remain robust
even across more heterogeneous populations to justify their broad clinical use.
To address this issue, we found that sufficient dataset integration across
multiple studies greatly enhanced reproducibility and accuracy in diagnostic
performance on truly independent validation sets, whereas signatures learned
from one dataset typically had high error on independent validation sets.
Looking forward, we discuss our approach in the context of improving blood diagnostics
for cancers of organ systems.
TBC-15:
Methyl-directed Site-specific DNA Endonuclease MteI is a New Instrument for
Analysis of CpG Island Methylation
Vasilina
A. Sokolova1, Valery A. Chernukhin1, Danila A. Gonchar1,
Elena V. Kileva1, Larisa N. Golikova1, Vladimir S. Dedkov1,
Natalya A. Mikhnenkova1, Elena V. Zemlyanskaya1, Vitaliy
V. Kuznetsov1 and Sergey Kh. Degtyarev1
1SibEnzyme
Ltd., Novosibirsk, Russia 630117
Methyl-directed
(MD) DNA endonucleases specifically cleave short methylated DNA sequences and
don¡¯t cut unmethylated DNA. Biochemical properties of MD endonucleases are
similar to those of restriction enzymes, both types of enzymes require only Mg2+
ions as a cofactor. Today more than ten MD DNA endonucleases recognizing
different sites with 5-methylcytosine are discovered and characterized [1].
Among them MD DNA endonucleases BlsI, BisI, PkrI and Glul have the same
recognition site 5'-GCNGC-3', but activity of these enzymes depends on the
amount and position of 5-methylcytosines in the recognition sequence.
A new
methyl-directed site-specific DNA endonuclease MteI was isolated from
Microbacterium testaceum. MteI recognizes a prolonged methylated DNA sequence
of nine bases in length with a central pentanucleotide 5¡¯-GCNGC-3¡¯. MteI
activity depends on a number of 5-methylcytosines and their positions in the
recognition site. MteI cleaves DNA sequence
5¡¯-G(5mC)G(5mC)^NG(5mC)GC-3¡¯/3¡¯-CG(5mC)GN^(5mC)G(5mC)G-5¡¯ as indicated by
arrows. The enzyme activity is significantly higher if 5¡¯-GC-3¡¯ dinucleotides
in this site are replaced by 5¡¯-G(5mC)-3¡¯ dinucleotides and additional
5¡¯-G(5mC)-3¡¯ dinucleotides are present in both DNA strands.
We have
developed a method of MteI-PCR assay which allows determining the methylated
CpG islands. The method includes DNA hydrolysis with MteI followed by PCR with
primers designed for the DNA region of interest. MteI-PCR assay has been
applied to study methylation of CpG islands located in regulatory regions of
tumor suppressor genes and revealed different patterns of DNA methylation.
1.
http://mebase.sibenzyme.com/md-endonucleases
TBC-16:
Nonunique SNP problems in association study
Lyong
Heo1, Young Jin Kim1, Sanghoon Moon1 and
Jong-Young Lee1
1Division
of Structural and Functional Genomics, Center for Genome Science, National
Institute of Health, Osong, Korea
In the
recent years, genome-wide association study (GWAS) have successfully identified
numerous phenotype associated SNPs. In GWAS, SNP is served as a marker
indicating a specific genomic region. Chromosomal position of each SNP is well
annotated in NCBI dbSNP database. In dbSNP, however, annotation errors have
been reported such as a SNP with multiple position, position change, and
chromosome change. Doron and Sheweiki reported that 4.2~11.9% of HapMap SNPs
were mapped to nonunique genomic region. Since a marker is only valid if it
maps to unique region, SNPs mapped at nonunique region would not be adequate
for association analysis. In this study, we analyzed nonunique SNPs in two
versions of dbSNP database, b130 (hg18) and b135 (hg19). Nonunique rsIDs
account for 3.46% and 2.26% of b130 and b135, respectively. Also, position
change due to dbSNP build update was 0.39% for b130 and 0.13% for b135. We
inquired GWAS catalog for studying the effect of nonunique SNPs. As of August
2012, GWAS catalog included 1355 publications with 8754 SNPs (7131 unique
SNPs). Among catalogued SNPs, we found 237 SNPs mapped at nonunique position.
Our results indicate that SNPs should be carefully annotated and tested for its
validity as a marker in association study.
TBC-P17:
Exonic variants in Korean population
Young
Jin Kim1, Kwang Joong Kim1, Lyong Heo1, Yun
Kyoung Kim1, Sanghoon Moon1, Youngdoe Kim1, Mi
Yeong Hwang1, Bong-Jo Kim1 and Jong-Young Lee1
1Division
of Structural and Functional Genomics, Center for Genome Science, KNIH, KCDC
Recent
advancement of high-throughput genotyping technologies has enabled us to carry
out a genome-wide association study (GWAS) in a large cohort. The main goal of
genome-wide association study is to identify the complex phenotype associated
loci. The discovery of the associated loci would lead us to understand the
underlying mechanisms of complex traits. Despite the great success of GWAS,
however, a limited number of susceptibility variants discovered in the previous
GWAS accounts for only a small proportion of phenotypic variance. Missing
heritability of the current genome analysis is the bottleneck preventing us
from taking a step forward to personal genome, personal medication, disease
prediction and prevention. In this context, Next Generation Sequencing (NGS)
technology has been gathered much attention due to its usability in accessing
genomic data at the base pair level of resolution. In this context, exome
sequencing comprising 400 Korean samples facilitated the assessment of full
spectrum of allele frequencies including coding altering variants. The analyses
of all variants within coding regions would reveal undiscovered possible causal
common or rare variants near previously associated loci.
TBC-18:
Development of Korea Common Data Model for Adverse Drug Signal Detection based
on multi-center EMR systems
Si Ra
Kim1, Seung Ho Park2, Bum Joon Park2, Kwang
Soo Jang2 and In Young Choi1
1Graduate
School of Healthcare Management and Policy, The Catholic University of Korea,
Seoul 137701 , Korea
2Master
course of engineering, Hanyang University of Korea, Seoul 133791, Korea
The
adverse drug reaction (ADR) research based on Clinical Data Warehouse(CDW) was
getting important in accordance with the electronic clinical information like
Electronic Medical Record (EMR) than spontaneous adverse drug reaction (ADR)
reporting. The drug safety monitoring based on EMR is able to collect more
objective pharmacovigilance and analyze ADR earlier than spontaneous adverse
drug reaction (ADR) reporting. We analyzed drug safety surveillance model with
three researches; EU-ADR data model of Europe, Mini-Sentinel data model of Food
and Drug Administration (FDA) and Observational Medical Outcomes Partnership
(OMOP) data model of National Institutes of Health (NIH). Based on the
comparison of three data models, we developed the Korea ADR common data model
(CDM) for early detection of adverse drug reaction in Korea. This project is
called as K-ADR (Korea- Adverse Drug Reaction). The K-ADR consists of eight
tables which contain demographic table, drug table, visit table, procedure
table, diagnosis table, death table, laboratory table and report-machinery
table. Each table consists of 5~12 fields. In addition, terminology standard
such as ICD-10 and WHO-ART will be provided to integrate multiple EMR systems.
The K-ADR reflected Korea EMR structures will contributes for pharmacovigilance
activity. The pharmacovigilance activity by using EMR is able to accurate
signal detection through the diagnosis name and drug prescription information
by patient. Also the K-ADR could be detected adverse drug events (ADEs) that
contain under-reported ADEs and deficient ADEs. Further efforts for development
of the standardized guidelines about procedure code and laboratory code will be
needed for multi-institutional pharmacovigilance database system. The
pharmacovigilance activity based EMR will be cost-effective method to detectADR signals.
Acknowledgement: This research was supported by a grant(12172KFDA212) from Korea Food and Drug Administration in 2012.
TBC-19:
Various nucleosome positioning patterns in Drosophila
Doo
Yang1,2 and Ilya Ioshikhes1,2
1Ottawa
Institute of Systems Biology, Canada
2Department
of Biochemistry, Microbiology & Immunology University of Ottawa, Canada
Nucleosome
plays an important role in gene regulation by affecting the accessibility of
transcription factors to the DNA. DNA sequence is one of the factors that
position nucleosomes.
Finding
the nucleosome positioning sequence (NPS) is challenging because the nucleosome
binding is not as specific as transcription factor motifs However, some
sequence features , such as dinucleotide periodicity, can be observed by
analyzing nucleosome sequences collectively.
Drosophila genome sequences of
H2A and H2A.Z nucleosomes were analyzed to find a novel NPS and relationship
with biological functions.
The
nucleosome positions and sequences were obtained from the published Chip-Seq
data (Mavrich, el al., 2008, Nature for H2A.Z and Henikoff, et al., 2011, Genes
& Devlop.) In order to minimize the noise in sequence pattern, only the +1
nucleosomes sequences were selected and separated into H2A and H2A.Z sequences.
Then the dinucleotide patterns were analyzed.
Two
novel NPS patterns, WW/SS and RR/YY, are proposed. The WW/SS sequence pattern
is similar but not identical to the previously proposed yeast NPS. The Drosophila WW/SS NPS has higher content
of SS at dyad. The 10 bp periodicity is stronger off the dyad and disrupted
near dyad. The RR/YY NPS shows that dinucleotides are more periodic between 25
to 45 bp from dyad than near dyad or outer region. GO analysis of the genes
having either WW/SS or RR/YY nucleosomes showed differences in biological
functions. It suggested that possible relationship between gene functions and
nucleosome sequences.
Comparison
of H2A and H2A.Z NPS showed differences in the dinucleotide pattern. The most
significant difference is that H2A.Z NPS has stronger peaks at the ± 45 bp from dyad instead of ± 55 bp in H2A. These positions in
DNA are close to the protein domain where H2A.Z and H2A histones are different.
In yeast, H2A.Z positioning is dependent on SWR1 and is immobile once
positioned. H2A.Z is also well phased at the down stream of TSS. Combined with
the fact that H2A.Z plays a role in proper gene activation, H2A.Z may serve as
a barrier of downstream nucleosomes to maintain the proper binding sites for
transcription factors and other proteins.
TBC-20:
Anonymized Patient Chart Review Tool in Asan Medical Center
Soo-Yong
Shin1,2, Yongdon Shin2, Yong-Man Lyu2, Hyo
Joung Choi2, Jihyun Park2 and Jaeho Lee1,2,3
1Department
of Biomedical Informatics, Asan Medical Center, Seoul 138-736, Korea
2Office of Clinical Research
Information, Asan Medical Center, Seoul 138-736, Korea
3Department of Emergency
Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul
138-736, Korea
Asan
Medical Center (AMC) has been developing AMC biomedical research infrastructure
to improve the efficiency of clinical research as well as to protect privacy of
patients. As a first step, AMC developed the anonymized patient chart review
tool to protect patients¡¯ privacy by complying with government regulations in
Korea. The primary purpose of this tool is to decide if a chosen patient should
be included or excluded for a proposed study by reviewing the patient¡¯s
anonymized clinical data. For this purpose, the AMC anonymized patient chart
review tool aims to provide the comprehensive clinical data in AMC data
warehouse including diagnosis, medication, lab results, pathology/radiology
reports, progress notes, admission note, discharge summary, and operative
report. Also it tries to provide the easy user interface by implementing the
same interface as other AMC medical information systems. To generate the
anonymized clinical data, 18 identifiers defined by HIPAA were removed as
follows: 1) each patient was assigned to new research ID which is different
from hospital patient ID. 2) All structured identifiers stored in EMR database
were removed. 3) The remaining identifiers in the narrative texts were masked
using the pre-defined regular expressions. As a future work, we have plans to
scramble the date in clinical data and develop one-time research ID method
which can generate a different ID each time even for the same patient for stronger
protection of patients¡¯ privacy. We are also developing a research cohort
discovery tool to estimate the approximate number of patients satisfying the
research criteria.
TBC-21:
Integrate Genomics and Molecular Interactome Data for Brain Tumor Pathway
Discovery and Prognosis
Jongkwang
Kim1, Gao Long1 and Kai Tan1
1University
of Iowa, Dept. of Internal Medicine, Dept. of Biomedical Engineering, 65536 Iowa
city, USA
Glioblastoma
(GBM: grade IV astrocytoma) is the most common and lethal form of brain cancer.
Median patient survival time is 15 mo. Few predictive gene markers for
prognosis and treatment. This study integrates three types of data:
transcriptomic, epigenomic profiles, and protein-protein interactome to find
pathway markers that are responsible for long-term survival (LTS) compared to
short-term survival (STS). 13 pathway markers were found from the integrated
analysis. Pathway markers were tested on 115 GBM patient samples for the
classification accuracy into STS and LTS cases. The accuracy (82.2%) is 13.6%
higher than using one or two types of data, demonstrating that integration of
transcriptomic, epigenomic and interactome data is a more powerful approach to
elucidating molecular pathways distinguishing GBM subtypes.
TBC-22:
Development of a Consumer-engaged Obesity Management Ontology based on Nursing
Process
Hyun-Young
Kim1, Hyeoun-Ae Park2, Yul Ha Min2 and Eun-Joo
Jeon2
1Eulji
University, College of Nursing, Deajeon 301-832, Korea
2Seoul National University, Seoul
110-799, Korea
The
purpose of this study is to develop an ontology to represent the
consumer-engaged obesity management process based on clinical practice
guidelines. Since life style modification by the consumers is the most
important aspect in obesity management, we introduced concepts of consumer¡¯s
engagement into obesity management process. We also considered data traffic
when we developed the ontology.
We
developed the ontology by defining the scope of obesity management, selecting a
foundational ontology, extracting the concepts, assigning relations among
classes, and representing classes and relations with Protégé.
We
identified behavioural intervention, dietary advice, and physical activity from
the guideline as obesity management strategies. Nursing process was selected as
a foundational ontology to represent consumer¡¯s engagement in obesity
management process. Since, consumers engage in their obesity management when
they identify expected. Nursing process is a patient-centered, and
goal-oriented method consisting of five phases (assessment, nursing diagnosis, outcome
identification, implementation, and evaluation). These phases are repetitive
and cyclic in obesity management process. First cycle represents first
encounter of obesity management from initial assessment to outcome
identification. Second cycle represents second encounter and onward. Two cycles
are connected through the assessment in the second cycle being the evaluation
of the first cycle. With this approach we were able to minimize data traffic in
the obesity management process. We extracted 127 concepts, which included
assessment data (such as sex, body mass index, and waist circumference) and the
inferred data to represent nursing diagnosis and evaluation (such as degree of
and reason for obesity and success or failure in life style modification). Relations
linking concepts are ¡°part of¡±, ¡°instance of¡±, ¡°derives from¡±, ¡°derives into¡±,
¡°has plan¡±, ¡°followed by¡±, and ¡°has intention¡±. The concepts and relations were
formally represented using the Protégé.
We were
able to represent obesity management with consumer¡¯s engagement using nursing
process as a foundational ontology. Nursing process can be used as a
foundational ontology to support development of ontologies representing
consumer¡¯s behavioural modification.
Acknowledgements:
This work was supported by the National Research Foundation of Korea (NRF)
grant
funded
by the Korea government (MEST) (no.2012-012257 and no. 2012- 0000998).
TBC-23:
Performance of microRNA target prediction algorithms
Jee
Yeon Heo1, Yongjin Choi1, Hae-Seok Eo1,
Youngho Kim1, Taesung Park2 and Hyung-Seok Choi1
1Bio&Health
Team, Future IT R&D Laboratory, LGE Advanced Research Institute, Seocho-gu,
Seoul 137-724, Korea
2Department
of Statistics, Seoul National University, Gwanak-gu, Seoul 151-747, Korea
MicroRNAs
(miRNAs) are a class of small non-coding RNAs (~22 nt), which regulate gene
expression through suppressing mRNA translation or inducing mRNA degradation by
binding to their target mRNAs in multiple biological processes such as cell
cycle control, cell growth, cell differentiation, apoptosis, embryo development
and so on. Many computational and bioinformatic approaches to predicting target
mRNAs of each miRNA have been developed including miRanda, PITA, TargetScan,
DIANA-microT, Microcosm and miRDB. Here, we compared the performances of these
six above-mentioned miRNA target prediction algorithms. First, 6,901 common
pairs (0.003%) were selected from the total 2,842,985 miRNA-target mRNA pairs
predicted by all six algorithms. Second, 3,507 validated miRNA-target mRNA
pairs were collected from the experimentally validated databases including
TarBase, miR2Disease, miRTarBase and miRecords. Among them, 879 pairs (25%)
were not predicted by any algorithm and 214 pairs (6%) were predicted by all
six algorithms. Finally, Receiver operating characteristic (ROC) curves and
area under curve (AUC) values were calculated to compare of the performance of
each algorithm. Our comparison results show that DIANA-microT has the highest
accuracy (60%) and miRanda has the lowest accuracy (49%) and prediction scores
of each miRNA target prediction algorithm are lowly correlated to each other.
TBC-24:
Graphical modeling of regulatory interactions in sporadic Inclusion Body
Myositis
Thomas
Thorne1, Pietro Fratta2, Michael Hanna3, Elizabeth
Fisher2 and Michael Stumpf1
1Centre
for Bioinformatics and Systems Biology Imperial College London, UK
2Department
of Neurodegenerative Disease, UCL Institute of Neurology, UK
3National
Hospital for Neurology & Neurosurgery, University College London, UK
Sporadic
Inclusion Body Myositis (sIBM) is a disease that causes inflammation of the
muscles and progressive weakening and wasting of the muscles, and the
mechanisms by which it acts are not currently fully understood. Here we present
an analysis of gene expression microarray data from both disease and control
cases in an attempt to identify regulatory interactions that may be involved in
the disease. To model the regulatory network structure we employ a Gaussian
Graphical Model (GGM) formalism, whereby the data are assumed to be generated
from a multivariate Normal distribution. In the GGM model a pair of genes will
only share an edge if they have a non-zero partial correlation – that is if
their correlation cannot be explained by the expression of any of the other
genes. Since we are faced with a situation in which there are a significantly
larger number of genes than data points, we apply a sparse regression
methodology to infer the partial correlations between genes. Here we choose to
apply a sparse Bayesian regression method that has been demonstrated to
outperform methods such as the Lasso. To perform inference of the model
parameters we apply variational inference, a technique whereby the Bayesian
posterior distribution is approximated by a factorised set of exponential
family distributions.
TBC-25:
Jiffynet: A web server generating Gene networks for newly sequenced species
Eiru
Kim1 and Insuk Lee1
1Biotechnology
Department of Biotechnology, College of Life Science and Biotechnology, Yonsei
University, Korea
Current
one of the emerging approaches in studying biological systems is systems
biology which is a study field that focuses on complex interactions in
biological systems. Since development of next generation sequencing technology,
large amounts of sequencing data as diverse species are now available. However,
lacking of their genetic analysis, It is no possible to study them systematic
approaches. For a biologist who wants to study novel species systematically, we
have developed a web server providing draft models of various networks. The
draft net, we call this "JiffyNet", which is made from mapping
associalogs with well-established existing network such as HumaNet, WormNet,
YeastNet, and RiceNet. Associalogs are derived from combining orthologs of two
species and their interaction. Through this it is possible to make JiffyNet of
user defined species by finding associalog and mapping to base networks. We are
making the webserver that enables biologist to build their own JiffyNet. A
biologist may upload their sequencing data, the server sends JiffyNet created
using the data through e-mail.
TBC-26:
Studying Plant Complex Traits Through Network-assisted Systems Genetics of Arabidopsis Thaliana
Tak Lee1,
Jung Eun Shim1 and Insuk Lee1
1Department
of Biotechnology, College of Life Science and Biotechnology, Yonsei University,
262 Seongsanno, Seodaemun-Gu, Seoul, 120-749, Korea
As next
generation sequencing (NGS) technology develops rapidly, Genome Wide
Association Study (GWAS) is being highlighted for searching genes that are
associated with certain traits such as disease genes in humans and stress
resistant genes in plants By sequencing genomes of organisms and statistically
associating sequence variants to certain traits, GWAS is expected to show high
performance on the discovery of novel genes. However, even though GWAS has high
cost and requires intensive work, it does not give expected outcomes so far.
Here, we present a novel way of analyzing associations between genetic variants
and phenotypes of a plant model organism, Arabidopsis
thaliana, by using a Network guided approach.
Using
the Arabidopsis functional gene
network (AraNet), we develop a unique algorithm that would effectively predict
the significant variant-phenotype associations of Arabidopsis GWAS. AraNet is constructed by integrating various
omics data and predicts functional relationships for 73% of total Arabidopsis genome. An algorithm that
combines GWAS data and integrated omics data of AraNet, would give more power
in predicting genes that have low significance in GWAS but still important in
certain phenotypes
TBC-27:
Systematic analysis of cell line data for the development of novel cancer
treatment
Nayoung
Kim1 and Sukjoon Yoon1
1Department
of Biological Sciences, Sookmyung Women¡¯s University, Seoul 140742, Korea
An
integrative approach of large-scale omics and drug response data on various
cell lines enables us to identify the cellular signaling and drug sensitivity
in cancer. Here we represent system-level analysis of cell line data for
predicting sensitivity and mechanism of targeted drug response based on major
genotypes of cancers. Association study with the genotypic classification was
performed on drug data and omics data such as transcriptome, proteome, and
phosphateome on human cancer cell lines. This approach reproduced the known
patterns of mechanism-based drug response in cancers. Furthermore, gene and
protein signatures significantly associated with genotype were identified and
integrated to drug-centered network. This study provides an integrated approach
for omics, drug response data, and cancer mutation types in cancers. Our
platform is applicable to generate an accelerated hypothesis and validate the
optimized therapeutic window for single or combined anticancer agents.
TBC-28:
Genome Signature Image (GSI): Concise visualization of species/strain-specific
profiles of repetitive element occurrences for cataloguing and evolutionary
studies
Kang-Hoon
Lee1, Kyung-Seop Shin2, Woo-Chan Kim2,
Jeongkyu Roh2, Seung-Ho Choi2, Dong-Ho Cho2
and Kiho Cho1
1Department
of Surgery, University of California, Davis and Shriners Hospitals for Children
Northern California, USA
2Division
of Electrical Engineering, School of Electrical Engineering and Computer
Science, Korea Advanced Institute of Science and Technology, Korea
The
genomes of living organisms, ranging from bacteria to humans, contain diverse
populations of repetitive elements (REs). Our recent studies revealed that the RE
profile, including RE arrays, of the human genome is unique in comparison to
the mouse genome while gene sequences of humans and mice share a homology of
~90%. Also, a preliminary survey of the genomes of various other species
demonstrated that genomic RE profiles are species-specific. In this study, we
developed a suite of protocols/programs to concisely visualize genome
signatures using species/strain-specific RE profiles. Since the genomes of
higher eukaryotes, including humans and non-human primates, have not yet been
fully decoded, we developed the genome signature technology using complete
genome sequences from the domains of Archaea and Bacteria. The genome sequences
of 117 Archaea-domain and 1,068 Bacteria-domain members were obtained from the
National Center for Biotechnology Information and subjected to a genome-wide
survey for the occurrence of 5-nucleotide REs. The top 50 highest frequency REs
were then selected from each genome followed by an assembly of the 50 different
REs into a RE string of 250 nucleotides, from high to low frequency. The string
of high frequency REs now represents a unique signature of each genome. Of
note, the two key parameters (number of high frequency REs and RE length) for
the generation of genome signature sequences are tuneable. The genome signature
sequence was then visualized into an image, named Genome Signature Image (GSI),
using a CMYK color scheme. Interestingly, not all members within a
pre-established phylogenetic branch shared similar CMYK color patterns and it
can be confirmed by examination of the GSIs of the 1,185 microorganisms using
different parameters. The tuneable GSIs represent and visualize unique
characteristics of any genome and the concise RE string of each genome enables
phylogenetic studies involving large sample numbers.
TBC-29:
Analysis of copy number variation in exome sequencing data
Mi
Yeong Hwang1, Sanghoon Moon1, Young Jin Kim1,
Lyong Heo1, Yun Kyoung Kim1,Youngdoe Kim1,
Bok-Ghee Han2, Jong-Young Lee1, and Bong-Jo Kim1
1Division
of Structural and Functional Genomics, Center for Genome Science, National Institute
of Health, Chungcheongbuk-do, 363-951, Korea
2Center
for Genome Science, National Institute of Health, Chungcheongbuk-do, 363-951,
Korea
Copy
number variation (CNV) has been reported lots of associations with complex
diseases such as schizophrenia and obesity. To discover CNVs in the human
genome, comparative genome hybridization array (aCGH) and single nucleotide
polymorphism (SNP) array have been mainly used. However, CNVs from these
array-based platforms have inaccurate breakpoints due to low resolution.
Therefore, it is hard to discover exact size of CNV regions. Moreover, small
size genomic variants such as less than 500 bp were also rarely detected.
Recently, next generation sequencing (NGS) techniques have developed rapidly.
In addition, exome sequencing approaches has been regarded as a tool for
Mendelian disease gene discovery.
In this
study, randomly selected 139 individuals enrolled from population-based cohort
were genotyped with Agilent/Hiseq exome sequencing. Much of the detected CNV
regions were validated by Agilent 60K aCGH. As a result, we discovered 10,084
from exome sequencing. More than 80% CNVs detected from exome sequencing
(8,113/10,084) was less than 300 bp in length. We compared all of the detected
CNV regions with previously reported regions and also examined recurrent
copy-number deletion regions that might cause loss-of-function.
TBC-30:
Identification of functional nucleotide sequence variant in the promoter of
CEBPE gene
Hyunju
Ryoo1, Minyoung Kong1, Younyoung Kim1 and
Chaeyoung Lee1
1School
of Systems Biomedical Science, Soongsil University, Seoul, Korea
Research
efforts have been made to identify genetic factors for susceptibility to
complex acute lymphoblastic leukemia (ALL). ALL has been known as the most
common childhood malignancy. Especially, a recent outstanding genomewide
association study (GWAS) revealed an association (odds ratio = 1.34, P = 2.88 *
10–7) of ALL with the SNP of rs2239633 in a 5¡¯upstream region of the
gene encoding CCAAT/enhancer binding protein epsilon (CEBPE) in an English
population (907 cases and 2,398 controls). The current study examined promoter
activity in the promoter region to see if sequence variants can regulate the
expression of the gene and to identify functional variant(s). Three haplotypes
were estimated with the rs2239633 and its proximity single nucleotide polymorphisms
(SNPs) in strong linkage. The wild haplotype was TGTTTTC (HT1) and second most
consisted of the entirely opposite alleles to the wild haplotype (CCACGCT,
HT2). Minigene constructs with the haplotypes were utilized to see the
luciferase activity. Their luciferase activity revealed the strongest
expression with the HT2 and the weakest with the HT1. Further luciferase
activity showed that rs2239632 was the functional nucleotide variant which had
made the different expression. The promoter activity concurred with our in
silico analysis where different transcription factors were predicted with the
haplotypes. We concluded that rs2239632 could regulate the expression of the
CEBPE gene. This might result in the association in the previous GWAS with the
rs2239633 which was strongly linked to the rs2239632 (r2=0.949). Its
risk allele would increase the gene product and lead to leukemogenesis. As a
result, person with the allele or the corresponding haplotype would be more
susceptible to ALL.
TBC-31:
Functional promoter nucleotide variants and their haplotypes of the gene
encoding CCL21
Wonhee
Jang1, Hyunju Ryoo1, Jihye Ryu1, Jeyoung Woo1,
Minyoung Kong1, Younyoung Kim1 and Chaeyoung Lee1
1School
of Systems Biomedical Sciences, Soongsil University, Seoul, Korea
Genetic
architecture for rheumatoid arthritis (RA) has been quite limitedly known in
spite of a great concern on its causal factors. Recent genomewide association
studies (GWAS), however, have identified several genetic signals associated
with susceptibility to RA. Especially, a meta-analysis of previously published
GWAS showed an association (P = 2.8 × 10−7, OR=1.12) with the gene
encoding chemokine (C-C motif) ligand 21 (CCL21) using a total of 3,393 cases
and 12,462 controls. The sequence variant (rs2812378) identified in the
meta-analysis was located in a 5¡¯upstream region of the gene. The current study
aimed to identify functional variants in the promoter region in which the
association signal was observed. Four nucleotide variants in an estimated
linkage disequilibrium block were considered as candidate functional variants.
Different transcription factors were predicted by allelic substitutions at all
of the variants. Luciferase assay revealed that the minigene construct with wild
haplotype (TCGG) had a smaller expression level than that with the haplotype of
CCTG which included risk allele of rs2812378 identified in the meta-analysis.
We concluded that the haplotype CCTG and the allele C of rs2812378 could
overproduce CCL21 comparing to their corresponding wild types. The
overexpression of the chemokine would lead to a larger susceptibility to RA
considering that the chemokine was involved in ectopic lymphoid structures
affected by RA.
TBC-32:
Development of Web-based Case Report System in Traditional Korean Medicine for
Clinic Doctor
Boyoung
Kim1, Seung-Min Baek1 and Sunmi Choi1
1Korea
Institute of Oriental Medicine, Daejeon 305811, Korea
The
paper develops a web-based case report system for Traditional Korean medicine to
be provided to Oriental Medicine doctors in local clinics. First of all, we
arranges literatures of case report, which are gathering existing papers of
case report, based on the STRICTA, and provide them as educational materials.
Additionally various types of case report should be standardized to be
accessible by web based system. Finally, we can prepare the foundation to
practice evidence-based Medicine in Traditional Korean Medicine through the
purposed system.
TBC-33:
ChemTools : Python based Chemoinformatics Toolkit
Jehoon
Jun1, Minjae Yoo1 and Kwang-Hwi Cho1
1Soongsil
University, Korea
Python
based Chemoinformatics Toolkit (ChemTools) has been developed. The development
of NMR and X-ray equipment led to the discovery of numerous chemical compound
structures. And these chemical structure databases led to in silico drug discovery using computers. Among many in silico methods, virtual screening is
an essential tool which is widely used in most of the pharmaceutical companies
and related academic fields. In these drug discovery processes, computational
tools for managing, mining, and collecting database are very important.
However, accuracy and performance of some of public available tools has limited
ability. For this reason, we have developed an chemoinformatics toolkit which
include several in- and out-house codes. ChemTools contains modules, such as
yaChI(Chemical line notation) , 3DG (3D structure generator from connectivity),
conformer generator and filters for eliminating unwanted data from large
chemical database, which are useful to treat large chemical database. And,
ChemTools can edit molecule atom-by-atom and bond-by-bond using very simple
syntax. ChemTools is based on python, so the modules could be combined with any
combinations in python script language. The toolkits inherit some modules from
Pybel such as SIMLES code generator, InChI code generator, and Energy
minimizer. The modules we developed such as yaChI and 3DG are more reliable
than any other modules have been released. The performance of in-house codes
are presented with their counterparts and shows improved performance. ChemTools
would be are very useful tools for researches which treat large chemical
database such in silico drug
discovery or material design.
TBC- 34:
Molecular Dynamic Studies to predicted protein-protein interactions using GPU
accelerated AMBER : application to TBC1 interacting Rab family proteins
Ok Sung
Jung1, Bong Hun Ji1 and Kwang-Hwi Cho1
1Soongsil
University, Korea
Current
advances in computer simulation enable us to perform large scale molecular
simulation relatively easily. Especially GPU accelerated AMBER package
(AMBER-GPU) shows improved performance, in terms of speed, compared to CPU
version. AMBER-GPU has been applied to study TBC1 interacting Rab family
proteins. As TBC family proteins function GTPase-activating protein for Rab
family proteins, TBC family proteins are considered to have important roles in
cell cycle and differentiation in various tissues. And, Rab family proteins are
known to be participated in protein transport, membrane traffic, exocytosis,
endosomal recycling by taking part in transport from endoplasmic reticulum to
Golgi complex. Therefore, knowing the interaction of TBC family proteins with
Rab family proteins is very essential for studying transport system.
However,
it is time-consuming and expensive to study the interactions between various
TBC family and Rab family. So, it is necessary to apply a computational
approach to predict the interaction of the complexes prior to the in vitro
experiments.
TBC1D4
(also known as AS160) and TBC1D1, are the two RabGAPs integral for the GLUT4
translocations in adipocytes and skeletal myocytes respectively, whose crystal
structure have been recently reported(PDBID:3QYE ). There are about 60 Rab
family proteins and 18 out of them are experimentally treated to investigate
the association with GLUT4 vesicles. Among them only a few (four) Rabs have
been shown to be potential substrates for TBC1D1 or TBC1D4. Recently, the
structures of TBC1D1 and Rab family proteins have been reported and more is
coming. Using the structures the experimental result of protein-protein
interaction between TBC1 and Rab family proteins are validated with
computational method using AMBER. A certain energy cut has been found between
binders and non-binders. We are expanding our work to the Rab family proteins
which any experiments are not done yet to find possible interacting partners.
TBC-35:
A Novel Data Mining Approach for Inferring Phenotypic Association Networks to
Discover the Pleiotropic Effects
Sung
Hee Park1 and Sangsoo Kim1
1School
of Systems Medical Science, Soongsil University, Seoul, Korea
Pleiotropy
is a genetic phenomenon that a single gene has effects on multiple phenotypes.
In the human diseases and model organisms, the pleiotropy can imply that
different mutations in the same gene cause different pathological effects.
Examples of pleiotropic effects have been observed more with an increasing
number of variants identified through genome-wide association studies (GWAS).
However, current GWAS are performed in a single trait framework without
considering genetic correlations between important disease traits. Hence, the
general framework of GWAS has limitations in discovering genetic risk factors
affecting pleiotropic genes.
This
work reports a novel data mining approach to discover patterns of multiple
phenotypic associations over 52 anthropometric and biochemical traits in KARE
and to infer the phenotypic association networks from the patterns expressed as
association rules. This method applied to the GWAS for multivariate phenotype
highLDLhighTG derived from the predicted patterns of the phenotypic networks
associated with high levels of triglycerides. The patterns of the phenotypic
association networks were informative to draw relations between plasma lipid
levels with bone mineral density and a cluster of common traits (Obesity,
hypertension, insulin resistance) related to Metabolic Syndrome (MS). The 15
variants of six genes (PAK7, C20orf103, NRIP1, BCL2, TRPM3, and NAV1) were
identified for significant associations with highLDLhighTG.
Our
results suggest that the six pleiotropic genes may play important roles in the
pleiotropic effects on lipid metabolism and the MS, which increase the risk of
Type 2 Diabetes and cardiovascular disease by analysis of Mouse QTL and PPI
interaction Network on top of phenotypic associations discovered. This work provides
insights into explaining disease comorbidity when the pleiotropic genes share
common etiological pathways.
TBC-36:
Transcription Interference Networks are the coordinators of the gene
expressions
Zsolt
Boldogkoi1 and Dora Tombacz1
1Department
of Medical Biology, Faculty of Medicine, University of Szeged, Szeged 6720,
Hungary
Gene
expression is mainly controlled at the level of transcription. Non-coding RNAs
play very important roles in this process at various levels of genetic
regulation, including the control of chromatin organization, transcription,
various post-transcriptional processes, and translation. In this study, we
report the detection of a genome-wide expression of antisense non-coding RNAs
from the genome of pseudorabies virus, which is a neurotropic-herpesvirus. We
put forward the Transcription Interference Network (TIN) hypothesis in an
attempt to explain the genomic design and the existence of the antisense RNAs
in a common interpretation framework. This hypothesis suggests the existence of
a novel genetic regulatory layer, which controls the cascade of herpesvirus
gene expression at the level of the transcription. The TIN is proposed to
represent a mechanism, which plays a central role in the programmed
step-by-step switches of transcription between kinetic classes and subclasses
of viral genes. The proposed model may be not restricted to the herpesviruses,
but might explain the mechanism of an important regulatory system existing in
other organisms belonging to various taxonomic classes.
This
project is supported by the Swiss Hungarian Contribution and the European Union
and co-financed by the European Social Found.
TBC-37:
Subnetwork-based analysis of human disease in protein complex with housekeeping
functions
Sanghun
Bae1, Hyunwook Han2, Hanwool Kim3 and Jisook
Moon1,2,3
1College
of Life Science, Department of Applied Bioscience, CHA University, Seoul, Korea
2Department
of Biomedical Science, CHA University, Seoul, Republic Korea
3CHA
Stem Cell Institute, CHA Health Systems, Seoul, Republic Korea
Given
that proteins in a living system serve the components of protein complexes or
molecular machines to achieve a number of cellular processes and aberrant
protein inter-relationship contribute to a disorder of molecular system, a
comprehensive analysis of protein-protein interaction network (PPIN) is essential
for a systemic understanding of human disease.
However,
a substantial number and complexity of the entire protein interaction has led
to the difficulty of network-based research, which makes analysis of
sub-network, otherwise known as small world, necessary because of the greatly
reduced number of proteins to be analysed. In this regard, the present study is
concerned with the sub-network consisting of components of one protein complex
that is responsible for basic cellular maintenance functions and their
interactors, with our aim focused on systemic approach to human disease.
To
construct human interactome PPIN as a first step for this study, we extracted
binary protein-protein interaction data from eight molecular interaction
database: HIPPIE, HPRD, REACTOME, BIOGRID, InnateDB, DIP, MINT and Intact; and
integrated them (172,400 interactions) to increase coverage of PPI data.
Proteins of interest used as seed-proteins and their neighbours in the
integrated PPIN were selected for creating sub-network, the components of which
were mapped to OMIM (Online Mendelian Inheritance in Man) data and GAD (Genetic
Association Database) data, representative sources of genotype-phenotype
correlation. In enrichment analysis (hypergeometric test), certain disease class
terms were over-represented in the sub-network. Moreover, Network properties,
GO term and pathway enrichment analysis revealed that the sub-network has
distinct features that provide a possible explanation for overrepresentation of
particular disease categories in the protein complex with housekeeping
function.
Our
findings suggest that a subnetwork-based, focused analysis can be a practical
application for understanding the underlying nature of human disease and allow
us to interpret the properties of disease-related genes on a systemic level.
TBC-38:
Functional haplotypes in 5¡¯ region of RGS14 gene
Jeyoung
Woo1, Minyoung Kong1, Younyoung Kim1 and
Chaeyoung Lee1
1School
of Systems Biomedical Science, Soongsil University, Seoul 156-743, Korea
Limited
knowledge has been known for genetic factors on multiple sclerosis (MS) which
leads to nerve degeneration in brain and spinal cord. Recently, an outstanding
genomewide association study (GWAS) showed that a single nucleotide
polymorphism (SNP, rs4075958) confer the risk of MS. The variant was located in
the promoter region of the gene encoding regulator of G-protein signaling 14
(RGS14), a GTPase activating protein (GAP). We investigated the promoter
activity of the variants in the region to see whether the sequences can
regulate expression of the gene and to identify functional variants in the
region. Three haplotypes were estimated with the rs4075958 and 4 SNPs in strong
linkage. For each haplotype, a minigene was constructed containing the selected
SNPs and firefly luciferase gene. Luciferase activity of each haplotype was
measured by Dual-Luciferase Reporter Assay system. As a result, promoter
activity has been shown different by the haplotypes. Especially, the largest
difference was observed between wild haplotype and the haplotype with all the
alleles complement to the wild type. This concurred with the previous GWAS in
which the SNP conferred the risk of MS. We concluded that the haplotype with
the complement alleles could increase expression of the RGS14 gene. The
overexpressed product suppresses G¥ái/o of mGluR4 and thus increases
cAMP that activates TH17. Consequently, the TH17 would
lead to neuroinflammation, and the accumulated neuroinflammation might increase
the susceptibility to MS.
TBC-39:
Health SORA, the Smart Health Care Program for Cancer Survivors
Young-Ho
Yun1, Ye-Ni Choi1, Moon-Kyung Shin1,
Kwang-Choon Kim2 and Jaegeol Cho2
1Seoul
National University College of Medicine, Korea
2Samsung
DMC R&D Center, Korea
Although
the numbers of cancer-survivors are steadily growing, there are few programs
designed to accommodate survivors with Information Technology-based (IT) health
promotion. According to previous studies, cancer survivors¡¯ Quality of Life
(QOL) is significantly lower than general population, yet there is few programs
designed for QOL of survivors, and only focus on specific area, such as
exercise and nutrition. Realizing the need of comprehensive health-care
program, we designed an IT-based program called Health SORA (Smart, Optimizing,
Realistic, Authentic health care program) customized for total health care of
cancer survivors.
We
studied and analyzed strategies and theories in various fields:
transtheoretical model (TTM), behavior/health psychology, fundamental principles
of coaching and other leadership theories. Combining the theories, program flow
chart is developed. Health care categories to be managed are determined by
previous publications. Categories cover physical, mental, social, and
existential areas for complete health care.
Managed
categories are 12 total, which including exercise, nutrition, emotion, physical
examination, fatigue, sleep, weight control, family and society, existential
well-being, comorbidity and medication, pain, and quit smoking and moderate
drinking. Each category is managed by following orders and the cycle repeats
weekly for most of them: 1)evaluation, 2)analysis, 3)decision making,
4)planning, 5)acting, and 6)monitoring and receiving feedback. For example,
user first assess one¡¯s exercise behavior (TTM, amount of exercise, regularity
etc.) in evaluation. Next, user reviews one¡¯s current exercise status and
decides whether to manage it or not. Once decided to manage, user can plan for
certain education and activity. After actual performance of activity, user
manages the category by reviewing one¡¯s status change in management phase.
This is
the first smart and comprehensive prognosis program that includes 12 important
health care areas for cancer survivors. We believe that this total health care
program can effectively contribute to improve health and QOL of cancer
survivors.
TBC-40:
A computational framework for differential alternative polyadenylation profiles
between cancer and normal cells
Jimin
Shin1,2, Hyunmin Kim1, Chaeyoung Lee2 and
David Bentley1
1Department
of Biochemistry and Molecular Genetics, University of Colorado School of Medicine,
Aurora, Colorado, USA
2School
of Systems Biomedical Science, Soongsil University, Seoul, Korea
Alternative
polyadenylation of mRNAs is greatly concerned as an important mechanism for
post-transciptional regulation in eukaryotic genes. Approximately half of all
expressed genes are thought to produce alternatively polyadenylated mRNAs in
human. Recent studies showed that alternative polyadenylation in a specific
tissue turned out to be important in oncogenesis. For example, mRNA isoforms
having longer or shorter UTR lengths were observed in breast cancer cell lines,
and a direction of the length changes is cell-type-dependent. This study aimed
to overcome limitations of appropriate statistical background models and
quantification of changes in the number of polyA sites in the currently
available computational analysis of Alternative polyadenylation. We proposed an
analysis with a computational framework for evaluation of the differential
Alternative polyadenylation profiles between normal and cancer cells. The
proposed approach deals with tasks of peak identification and peak comparison.
It was to use a nonparametric normalization with LASSO algorithm in order to
panelize peak patterns with artifacts. This method is called polyA shifting
index (PSI). The PSI has a property of capturing non-linear trends of the
changes in the numbers of polyA sites. Furthermore, the corresponding statistic
also has an unbiasedness property in the changes over a long distance. The
proposed method is needed to be publically available, which would accelerate
identification of the differential Alternative polyadenylation profiles.
TBC-41:
The genetic regulation of aging process and age-related disease
Han
Wool Kim1, Hyun Wook Han2, Sang Hun Bae2 and
Ji Sook Moon1,2
1CHA
Stem Cell Institute, CHA Health Systems, Seoul, Republic Korea
2Department
of Biomedical Science, CHA University, Seoul, Republic Korea
Aging
process is inevitable biological process of all life, and its fundamental
mechanism remains unresolved. Recent studies only investigated simple
difference of the network properties and disease classification from the
relationship between aging genes and genetic disease genes. Further
contributing factors such as methylation and miRNA are more important to
uncover aging process and pathogenesis of diseases. Here, for further
investigation, we compiled and analyzed human disease (OMIM) and aging (GenAGE)
genes to investigate the relationship between aging and disease genes. We
categorized the genes with three gene groups: disease only genes, aging only
genes, and aging-disease genes. Each of these groups was subsequently
characterized. Of the 2117 genes, 1856 genes were disease only, 155 genes were
aging only, and 106 were aging-disease genes. Interestingly, Analyses of GO
(Gene Ontology) enrichment, transcription factor, protein interaction network,
and methylation revealed that each gene group is uniquely involved in different
functional categories, and show different transcription factors, miRNA, degree
centrality, and methylation pattern. Also, from analyses of disease genes, we
uncovered that disease only and aging-disease genes are enriched in different
disease categories. Our results shed light on elucidating the relationship
between the genesis of a various diseases and aging process.
TBC-42:
Discovery of Pathway Information Content of Protein Domains based on Domain
Co-occurrence Network
Jung
Eun Shim1 and Insuk Lee1
1Department
of Biotechnology, College of Life Science and Biotechnology, Yonsei University,
262 Seongsanno, Seodaemun-Gu, Seoul, 120-749, Korea
Identification
of functional building blocks, such as proteins, genes, and protein domains, is
important for understanding the biological processes of a cell. Protein domain
is particularly useful feature, because it is the structural, functional and
evolutionary units of proteins. However, domain-based identification of protein
function is still quite difficult problem. In this reason, we developed a
network-based quantification of domain functions to identify protein domains
which play a critical role to drive protein-level functions, using Domain
Information Content Score (DomICS). In this framework, we first constructed a
gene network by domain co-occurrence measured in which we give larger weights
to rarer domains, and then measured association scores of a specific pathway
using the linkage information in our network. Finally, we developed the pathway
information content of each domain, meaning the specificity of pathway
associated domains.. To evaluate the performance of the proposed method, in a
microbe yeast (Saccharomyces cerevisiae) and multi-cellular human (Homo
sapiens), we evaluated the predicted pathway information content of each domain
by literatures and the enrichment analysis with known domains for Gene Ontology
biological process (GO-BP) terms by Interpro2GO.
TBC-43:
Identification and Characterization of Gastric Cancer Subtypes using Expression
Microarray Data
Haein
Kim1, Ensel Oh1, Young Kee Shin1 and Yoon-La
Choi2
1Laboratory
of Molecular Pathology and Cancer Genomics, Seoul National University College
of Pharmacy, Seoul, Korea
2Laboratory of Cancer Genomics
and Molecular Pathology, Department of Pathology, Samsung Medical Center,
Sungkyunkwan University School of Medicine, Seoul, Korea
Gastric
cancer is one of the most common cancer in Korea, and the development of
targeted therapies in the treatment of gastric cancer have been accelerated by
the emerging understanding of gastric cancer genome. Alike other types of
cancer, gastric cancer is highly heterogeneous, and the identification and
characterization of gastric cancer subtypes are the first step to search novel
targets for anti gastric cancer drugs. We selected 265 genes showing
significantly over expressed in gastric tumors by comparing the expression
microarray data of 80 paired gastric tumor and matched normal tissues using
Significance Analysis of Microarray (SAM) and NetRank with COXPRESdb database.
With the selected genes, we identified two subtypes (subtype A and subtype B) of
gastric cancer by clustering the independent 200 gastric cancer tissues.
According to GO analysis, the 88 genes which showed high expression in subtype
A were related to angiogenesis and Wnt-signaling, and the last of the selected
genes which showed high expression in subtype B were involved with immune
response such as monocyte and leukocyte chemotaxis. We observed that the
subtype A included high stage (stage¥²,
IV) tumors more than subtype B, and it seemed to be related with the active
angiogenesis and Wnt-signaling in subtype A. In subtype B, high activity of
immune response seemed to keep early tumors from developing to higher stage.
From the identification of two subtypes of gastric cancer and characterizing
each subtype, we could understand the gastric cancer genome more profoundly and
the selected genes would provide the clue to find the targets for anti-gastric
cancer drugs.
TBC-44:
Functional nucleotide polymorphism in the promoter region of WFS1 gene
Yoonsook
Moon1, Minyoung Kong1, Younyoung Kim1 and
Chaeyoung Lee1
1School
of Systems Biomedical Science, Soongsil University, Seoul, Korea
Genomewide
association studies have identified common variants of the genetic risk for
type 2 diabetes (T2D), especially by several international consortia. A recent
meta-analysis has revealed four nucleotide variants including rs4689388
associated with T2D (P < 2 x 10-8).
The variant was located in the promoter of Wolfram Syndrome 1 (WFS1) gene.
Thus, we investigated promoter activity with 2 haplotypes (ATCGT with the
frequency of 0.67, GATCG with the frequency of 0.33) estimated with 5 SNPs
(rs4689388, rs4320200, rs13107806, rs13127445, and rs4273545) in strong linkage
around the rs4689388. Luciferase assay for reporter-WFS1 haplotype constructs
in HEK293 cells showed that the minigene with the wild haplotype showed a
larger expression level than that with the minor haplotype (P < 0.05). Further analysis revealed
that the expression level with the minor haplotype was smaller (P < 0.05) than that with the
substitution of its first allele (AATCG), but corresponding to that with the
wild haplotype (P > 0.05). In
conclusion, rs4689388 was the functional variant for up-regulation of the WFS1
gene. Its major allele (A) could produce excessive product of the gene, which
increases endothelial reticulum (ER) stress. Finally, a considerable ER stress
would lead to a large susceptibility to T2D.
TBC-45:
Comparison of Formaldehyde Fixed Paraffin Embedded (FFPE) and Frozen Tissues
for Exome Sequencing
Ensel
Oh1, Yoon-La Choi2 and Young Kee Shin1
1Laboratory
of Molecular Pathology and Cancer Genomics, Seoul National University College
of Pharmacy, Seoul, Korea
2Laboratory
of Cancer Genomics and Molecular Pathology, Department of Pathology, Samsung
Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
Formalin-fixed,
paraffin-embedded (FFPE) tissue is the most widely practiced method for
clinical sample preservation and archiving. However, FFPE tissues have been
unfavoured for NGS sequencing because its DNA/RNA is likely to be mutated or
degraded through the preparation procedure of formaldehyde fixation. We investigated
whether the DNA from FFPE tissue was compatible with frozen tissues for exome
sequencing. Exome sequencing was performed with two paired FFPE and frozen
tissues generated from two dermatofibrosarcoma protuberance (DFSP) cancer
tumors. The DNA from the FFPE tissues were severely degraded compared to the
frozen tissues, therefore, the insert size of the FFPE tissue was quite shorter
than the frozen tissues. However, the sequencing base quality of the FFPE
tissues was as good as frozen tissues, and the average coverage of both types
of tissues were almost the same as about x100. The rate of properly mapped
paired reads were about 90% for frozen tissues and 70% for FFPE tissues, and
more than 95% of total targeted exomes were completely covered in both frozen
and FFPE tissues. The number of SNPs called from FFPE tissues were similar to
from the frozen tissues, and the dbSNP rate and Ti/Tv ratio of SNPs from FFPE
tissues were 95% and 2.5 respectively. The number of Indels from FFPE tissues
were also similar to from frozen tissues. Tumor specific SNPs were selected by
subtracting the SNPs in blood from either the SNPs in FFPE or in frozen
tissues, and the FFPE and frozen tissues showed well overlapped lists of SNPs
indicating that FFPE is compatible with frozen for exome sequencing. From the
results, we conclude that FFPE tissue could be a good resource for cancer
genome study using exome sequencing.
TBC-46:
Molecular and biochemical characterization on the artificial hibernation in the
olive flounder, Paralichthys olivaceus
Meehye
Kang1, Gila Jung1, Sung Kim1, Wan-Soo Kim1
and Youn-Ho Lee1
1Marine
Ecosystem Research Division, Korea Institute of Ocean Science & Technology,
Ansan, Korea
The aim
of this study was to understand the molecular and physiological changes in an
artificially hibernated olive flounder, Paralichthys olivaceus. At first,
biochemical properties of artificially hibernated organism were examined
through blood analysis. Serum glucose and triglyceride were significantly
increased (p < 0.05) during
hibernation, while alkaline phosphate (ALP) and glutamic-pyruvic transminase
(GPT) had no significant change (p
> 0.05). Then the genes associated with the artificial hibernation were
investigated with the brain tissue using RNA-seq technology. Change of the
expressed genes was examined with DEGseq R package, and gene ontology (GO)
functional enrichment analysis. A total of 915 differentially expressed genes
including 468 up-regulated and 447 down-regulated genes (p < 0.001) were identified. The GO of the differentially
expressed genes (DEGs) revealed 45 significantly enriched GO terms indicating
up and down regulation of genes, most of which were associated with protein
binding, transcription factor activity, transcription factor complex, and
sequence-specific DNA binding. Several genes such as intestinal fatty acid
binding protein (IF), period 4, and somatolactin (SL) showed significant change
in the expression level. For IF and SL, the change of expression level was
quantitatively confirmed by the real time PCR.
TBC-47:
Unraveling selection signatures by composite log likelihood
Jihye
Ryu1 and Chaeyoung Lee1
1School
of Systems Biomedical Science, Soongsil University, Seoul 156-743, Korea
Positive
selection not only increases beneficial allele frequency but also causes
augmentation in allele frequencies of sequence variants in proximity. Signals
for the positive selection would be identified by harbouring distribution of
the sequence variants around a favourable mutation, and statistical differences
from the expected values by chance determines the signals. We introduced a
composite log likelihood-based method (CLL) which calculates a composite
likelihood of the allelic frequencies observed across sliding windows of 5
adjunct loci and compares the value with the critical statistic estimated by
50,000 times of permutation. We applied the method to identification of
selection signatures in Korean cattle. A total of 11,799 nucleotide
polymorphism data were used for 71 Korean cattle and 209 foreign beef cattle.
As a result, 147 signals were observed between Korean cattle and foreign cattle
(P < 0.01). The selection
signatures with the greatest CLL for each of 30 chromosomes encompassed 148
sequence variants among which 41 variants were located in the region encoding
proteins. The signals might be candidate genetic factors for beef quality by
which the Korean cattle have been selected.
TBC-48:
The Health Avatar Platform: development of platform for interacting health
agents and personal avatar
Hee-Joon
Chung1,2, Byoungoh Kim1, Taehun Kim1, Keun
Bong Kwak1 and Dongman Lee1
1Department
of Computer Science, KAIST, Daejeon 305701, Korea
2Seoul
National University Biomedical Informatics (SNUBI), Seoul National University
College of Medicine, Seoul 110799, Korea
eHealth
is a field of increasing interest with the potential to revolutionize the way
health care and prevention is provided, shifting the balance of power and
responsibility from health care professionals to patients and citizens. Health
avatar is a user application that provides health information through health
agent based on personal medical, genomic and ubiquitous data. The Health Avatar
Platform (HAP) is a run-time environment for allowing appropriate intelligent
health agents to get ¡°plug-in¡±ed to a health avatar and providing a data and
access grid for heterogeneous clinical and genomic data.
We have
completed the first phase of the HAP: a) defining an application programming
interface for both avatar and agent developers, b) developing a broker that
provides a match-making service between agent and avatar and a communication
channel between them, and c) prototyping an obesity management agent application
as a showcase of the system capabilities.
TBC-49:
Systematic Analysis of Genotype-dependent Gene Expression Signatures and Drug
Sensitivity in NCI60 Datasets
Ningning
He1 and Sukjoon Yoon1
1Department
of Biological Sciences, Sookmyung Women¡¯s University, Seoul 140-742, Korea
Most
cell lines recapitulated known tumor-associated genotypes and genetically
defined cancer subsets, irrespective of tissue types. Drug treatment on many
different cell lines provides an important preclinical model for early clinical
applications of novel targeted inhibitors. The NCI60 is a program developed by
the NCI/NIH aiming the discovery of new chemotherapeutical agents to treat
cancer. Here we present a novel statistical method, CLEA (Cell Line Enrichment
Analysis) to quantitatively correlate the genotype with gene expression
signatures and drug sensitivity in cancer cell lines. The results provided us
new insights on genotype-dependent gene expression signatures, cancer pathways
and chemical sensitivity. It will have applications in predicting and
optimizing therapeutic windows of anti-cancer agents.
TBC-50:
The role of TRP channel interactome in prostate cancer
Jin-Muk
Lim1, Jung Nyeo Chun2, Hong-Gee Kim1 and
Ju-Hong Jeon2
1Biomedical
Knowledge Engineering Lab, Seoul National University, Korea
2Department of Physiology,
Seoul National University College of Medicine, Korea
Transient
receptor potential (TRP) channels translate various cellular stimuli into
electrochemical signals, leading to changes in membrane potentials and
intracellular Ca2+ levels. Aberrant regulation of intracellular Ca2+
homeostasis is closely associated with various cancers, particularly prostate
cancer: however, the possible involvement of TRP channels in prostate cancer is
largely unknown. To explore the role of TRP channels in prostate cancer, in
this study, we have attempted to extract and integrate two different datasets:
prostate cancer microarray data from the GEO database (accession # GSE3325) and
TRP channel interactome data from the TRIP Database 2.0
(http://www.trpchannel.org). We found altered expression pattern of TRP channel
interactome components according to tumor stages (benign, primary, and
metastatic), which is represented as node-weighted networks using the Cytoscape
program. Co-expression correlation analysis identified that certain TRP channel
isotypes tend to be co-expressed with their interacting proteins, which can
support disease module hypothesis of network medicine. In addition, we
performed GO and pathway analyses to identify how certain TRP channels are
associated with prostate cancer phenotypes. Our results may help future
experimental investigation to understand the role of TRP channel-mediated Ca2+
signaling in prostate cancer biology and to develop novel therapeutic
strategies for treatment of prostate cancer. [This research was supported by the MKE(The Ministry of Knowledge Economy), IT Convergence Healthcare Research Center support program supervised by the NIPA(National IT Industry Promotion Agency) (NIPA-2012-H0401-12-1001)]
TBC-51:
Using CSSP to predict chameleon peptides
Xiaoqi
Wang1 and Sukjoon Yoon1
1Department
of Biological Sciences, Sookmyung Women¡¯s University, Seoul 140-742, Korea
The
sequence potential for non-native ¥â-strand formation and the presence of
protein sequences have been investigated extensively from the perspective that
such structural features are implicated in protein stability and effectiveness.
We demonstrated that calculation of contact-dependent secondary structure
propensity (CSSP) is highly sensitive in detecting non-native beta-strand
propensities in helical regions of proteins. Beta-sheet formation is the main
reason for protein aggregation. Based on our study, the CSSP method offers an
alternative for designing peptide fragments with varied propensity for
conformational change between helix and beta-strand.
TBC-52:
Transcriptome analysis during the developmental stages for predator induced
polyphenism in Daphnia pulex
Haein
An1, Gila Jung2 and Chang-Bae Kim1
1Department
of Green Life Science, Sangmyung University, Seoul 110743, Korea
2Marine
Ecosystem Research Division, Korea Institute of Ocean Science and Technology, Ansan
426744, Korea
An
invertebrate crustacean Daphnia pulex
is one of the most suitable models for understanding how organisms adapt and
survive to aquatic environmental stresses including predator-induced
morphological responses. It has been known that neckteeth formation and
maintenance at critical times is a defensive mechanism for D. pulex against the
predator Chaoborus sp. The genetic
mechanism of the defensive morph formation and maintenance for developmental
ranges is very little known. To understand its genomic mechanism, we carried
out comprehensive transcriptomes at various developmental stages in D. pulex by using RNA-seq technique. As
the results, 37 Gb raw reads were generated and assembled. The 62,228 unigene
clusters were annotated by blastx alignments against NCBI non-redundant (NR),
COG, SwissProt, GO, and KEGG databases. According to the searches, 30,495
unigene clusters were matched to at least one database. Gene expression
differences among developmental stages were greater than those between the two
phases, normal and defensive morph in each stage. Differentially expressed
transcripts (DETs) were discovered by measuring and comparing gene expression
between the two phases in each stage. The most distinct phase differences in
gene expression appeared in adult/egg stage. According to the detailed
analyses, the defensive morph in the stage shows lower activity in signalling
molecules and interaction, nucleotide metabolism. We identified 68 transcripts
as candidates for defensive morph markers, containing insect cuticle protein
and receptor transporting protein. This study could contribute to further
studies of the candidate genes and epigenetic mechanism for defensive morph
formation and maintenance in D. pulex.
TBC-53:
Network analysis by phylogenetic profiling revealed domain-specific evolution
of cellular pathways
Junha
Shin1 and Insuk Lee1
1Network
Biology Laboratory, Department of biotechnology, College of Life Science and Biotechnology,
Yonsei University, Seoul 120749, Korea
Phylogenetic
profiling is a computational method to identify functional associations of
genes within one organism, based on the comparisons of evolutionary
co-inheritance patterns according to the completely sequenced genomes of other
organisms. The composition – both abundance and heterogeneity - of genome set
and the scoring scheme for relationship are two important factors to affect to
the utility of a profile. Because a profile needs only genome sequence data to
be generated, it is a practical bioinformatic technique along with recently
advanced sequencing techniques and those exponentially growing sequenced data
results. There are several previous reports that this method works optimally
with a genome set consisted of bacterial organisms only.
Here we
reinvestigated the optimal condition for phylogenetic profiling with increased
fully sequenced genomes which were not available in previous studies. We could
verify the improvement of prediction performance by grown numbers of genome
data; therefore, at now, it could be available not only to discover functional
association of genes even in higher eukaryote but also to retrieve human
disease genes via investigating the resultant network model. Moreover,
co-inherited genes associations show differences in various features between
the inherited orientation of prokaryote and eukaryote. Followed by these
distinctions, we could find the domain-specific nature and also explain the
molecular mechanisms of pathway-level evolution.
TBC-54:
Functional polymorphism located in the promoter of the coagulation factor XI
gene as a putative genetic factor for susceptibility to venous thromboembolism
Minyoung
Kong1, Younyoung Kim1 and Chaeyoung Lee1
1School
of Systems Biomedical Science, Soongsil University, Seoul, Korea
Several
genome-wide association study (GWAS) and meta-analysis of GWAS have been
conducted for venous thromboembolism (VTE). A recent MARTHA and FARIVE project
was reported the rs3756008 in promoter region of the coagulation factor XI
(FXI) gene as nucleotide sequence variant associated with VTE in European (P =
6.46 x 10-11). Coagulation factor XI (FXI) is the zymogen of a
plasma serine protease (FXIa) triggered the middle phase of the intrinsic blood
coagulation pathway, and its plasma levels were associated with VTE. Thus, we
searched the SNPs in strong linkage around the rs3756008, and the rs3756009 was
selected. We investigated alteration of luciferase-reporter gene expression by
the 2 haplotypes (AA with the frequency of 0.62, TG with the frequency of 0.38)
and by the each SNP in HEK293 cells. Wild haplotype-reporter minigene showed a
larger expression level than minor haplotype-reporter minigene (P < 0.001). Further analysis revealed
that nucleotide substitution (A to T) at rs3756008 showed difference for
expression level of 2 haplotypes (P
< 0.001). In conclusion, minor allele (T) at rs3756008 was the regulatory
allele for low expression of the FXI gene. Low FXI levels might result in
reduced functional activity of activated coagulation factor XII (FXIIa), and
blockage of FXIIa activity might be involved in the risk of vessel occlusion.
It could not exclude a possibility that low FXI levels might lead to a
susceptibility to VTE.
TBC-55:
Temporal gene expression profiles identify genetically determined transcriptional
regulation of human leukocytes
SeongBeom
Cho1, InSong Go2, Hyo-Jeong Ban1, Hyesun Yoon1,
Yeunjung Kim1, Jaepill Jeon1 and BokGhee Han1
1Center
for Genome Science, National Institute of Health, Korea Center for Disease
Control, Chungcheongbuk-do, Republic of Korea
2Department
of Physiology, School of Medicine, Hanyang University, Kyungkido, Republic of
Korea
In this
study, we investigated genetic markers affecting temporal gene expression in
human leukocytes using expression quantitative trait (eQTL) loci analysis.
During an oral glucose tolerance test, glucose, insulin levels and gene
expressions of leukocytes in peripheral blood were measured at three time
points. Through eQTL analysis, we identified relationship between gene
expression, genetic component and environmental factors. Association analysis
between the gene expressions and SNPs only (marginal model) found cis SNPs showing differential
allele-specific gene expression. The analysis with the interaction terms
(interaction model) identified interactions between SNPs and temporal glucose
or insulin levels, or both, which significantly affected gene expression.
Functional annotation revealed that the significant SNPs of the marginal model
were related to various diseases. Moreover, SNPs of the interaction model
showed a strong tendency for transcription factor binding site enrichment.
Finally, using a differential allele-specific coexpression (DACE) method, we
searched for SNP–pathway pairs that showed molecular networks of significant
allele-specific changes of coexpression. The DACE method identified a trans-regulatory effect of the SNPs on
pathway gene coexpression patterns. In conclusion, we identified tentative
genetic markers affecting temporal gene expression change in human leukocytes
through a genetic component alone or through interaction with the genetic
components, glucose and/or insulin. These results will be resource for studying
regulatory components of biological processes that are either determined by genetic
component alone or by gene–environment cross talk.
TBC-56:
gsGator – an integrated web platform for cross-species gene set analysis
Hyunjung
Kang1, Sooyoung Cho1, Ikjung Choi1, Yeongjun
Jang2, Sanghyuk Lee1,2 and Wankyu Kim1
1Department
of Life and Pharmaceutical Science, Ewha Womans University, Ewha Research Center
for Systems Biology, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750 Korea
2Korean
Bioinformation Center, Korea Research Institute of Bioscience and
Biotechnology, Daejeon 305-806, Korea
Gene
set analysis (GSA) is useful to interpret its biological theme using a priori defined gene sets such as gene
ontology or pathway. While model organisms are a rich source for inferring the
function of human genes, few GSA tools enable to use these information. Here,
we developed gsGator, a web-based platform for functional interpretation of
gene sets with many useful features such as cross-species GSA, simultaneous
analysis of multiple gene sets, and a fully integrated network viewer. An
extensive set of gene annotation information is amassed including GO &
pathway, genomic annotation, molecular network, miRNA target and phenotype
information from various model organisms. gsGator enables virtually
fully-automated analysis, providing intuitive understanding of the relations
among genes and gene sets using an interactive network viewer. Particularly,
gsGator supports cross-species GSA in a user-friendly manner, allowing full
utilization of accumulated knowledge e.g. knockout phenotype from model
organisms. Cross-species GSA greatly expands the scope of GSA, leading to the
discovery of conserved gene modules among different species.
(http://gsGator.ewha.ac.kr).
TBC-57:
Identification of transcriptional network regulating prognostic gene expression
signature of colorectal cancer patients
Taejeong
Bae1,2,2, Kyoohyoung Rho1, Yong-Ho In2,3 and
Sunghoon Kim1
1College
of Pharmacy, Seoul National University, Seoul 151-742, Korea
2Information Center for
Bio-pharmacological Network, Seoul National University, Suwon 443-270, Korea
3Medicinal Bioconvergence
Research Center, Advanced Institutes of Convergence Technology, Suwon 443-270,
Korea
4Korean Bioinformation Center,
Daejeon, Korea
5World Class University
Program Department of Molecular Medicine and Biopharmaceutical Sciences, Seoul
National University, Seoul 151-742, Korea
Background
Identification
of gene expression signatures in cancer patients has been proven useful to
determine the cancer types and stage and also to predict the prognosis of
patients. However, expression signature itself does not provide information
about the causality of changes of pathological cellular states. Construction of
a transcription network that regulates the cancer signature can provide clues
to hidden mechanisms of cancer progression.
Results
Here we
inferred and analysed the transcriptional network regulating prognostic gene
expression signature of colorectal cancer that is known to classify patients to
good prognosis and poor prognosis group. To construct a colon cancer-specific
regulatory network, we used the ARACNE algorithm followed by a series of
filtering algorithms to find significant transcription factors. The inferred
network consists of 9 transcription factors (TFs) regulating 75 genes out of 86
genes in colon cancer signature. The following analysis identified 6 TFs
(PRRX1, SPDEF, FOSL2, HIF1A, RUNX1 and FOXD1) as master regulators regulating
high risk signature genes for poorer prognostic subgroup and 3 others (PLAGL2,
ASCL2 and TCF7) as ones regulating low risk signature genes for better
prognostic subgroups. The common tumorigenic feature of HIF1A, RUNX1 and FOSL2
suggested that the tumorigenic feature of prognostic gene signature may be
involved in metastasis of colorectal cancer while the tumorigenic roles of
PRRX1, SPDEF and FOXD1 are unclear.
Conclusions
These
results showed that the transcriptional network analysis is a powerful tool to
reveal the regulatory programs related to prognosis of colorectal cancer
patients.
TBC-58:
Local Similarity Search of Physicochemical Properties in Protein-Ligand Binding
Sites
Lee
Sael1 and Daisuke Kihara2
1State
University of New York Korea, Korea
2Purdue
University, USA
Physicochemical
similarity search of protein binding site have various applications such as
finding the protein binding partners, protein function prediction, and
prediction of unintended drug binders. We present two ligand binding pocket
comparison methods: Pocket-Surfer (Chikhi R. et al. Proteins, 2010) and
Patch-Surfer (Sael L. et al. Proteins, 2012). Pocket-Surfer captures shape and
physicochemical properties of a binding site surface globally. In contrast,
Patch-Surfer represents a binding site as a combination of segmented surface
patches, each of which is characterized by its geometric shape, electrostatic
potential, hydrophobicity, and concaveness. By relaxing the constraint put on
by rigidity of global binding site structure, local similarities can be
captured. This is effective when pocket shapes are slightly different due to
structural flexibility but bind to the same ligand type. Both methods encode
the surface properties of whole pocket or patches that compose the pockets by
the 3D Zernike descriptors, which have been found to be successful in
representing protein global surface properties (Sael L., Li B., et al. Proteins,
2008; Sael L., La D. et al. Proteins, 2008). We validated the two proposed
method by measuring the prediction accuracy of the ligand binding predictions,
i.e., predictions of the types of ligand that can bind to proteins. The
performance was evaluated on a data set of 100 non-homologous proteins that
bind to either one of nine types of ligands. 84.0% of the binding ligands were
predicted correctly within the top three scoring ligands with the shape and
pocket size information using the Patch-Surfer and 81.0% when Pocket-Surfer was
used. The performance was further improved to 87.0% when surface properties,
i.e. electrostatic potential and hydrophobicity, were added in the
Patch-Surfer. Overall, we show that proposed methods are powerful in protein
binding site similarity analysis even in the absence of homologous proteins in
the database.
TBC-59:
Association analysis of CNV data with linear mixed model
Meilling
Liu1, Sanghoon Moon2, Youngjin Kim2 and Sungho
Won1
1Dept
of Statistics, Chung-Ang University, Korea
2The
Center for Genome Science, Korea National Institute of Health, Korea
Copy
number variation (CNV) has been expected to have an important effect on human
genetic diseases. However even though several statistical methods have been proposed
for CNV association studies, most of the existing approaches are restricted to
the independent individuals. In this manuscript, we provided a new method for
the analysis of CNV with related samples and it can also be applied to the
unrelated samples under the presence of population substructure. The proposed
approach consists of signal model, phenotype model and copy number model where
the signal model provides the relationship between the observed intensity and
the unknown CNV, and phenotype model explains the causality of the CNV to the
phenotype. In our approach, we considered the correlation structure for both
signal and phenotype model, and the multiple probe intensities are incorporated
to them. Our simulation studies show that the proposed method outperforms the
previous approaches and we illustrate the practical implications of the new
analysis method by an application to Alzheimer.
TBC-60: Analysis
of longitudinal data : Applications of Linear Mixed Model to The Korean
Association Resource(KARE)
Young Lee1,
Suyeon Park1, Woojoo Lee2 and Sungho Won1
1Dept
of Statistics, Chung-Ang University, Seoul, Korea
2Department
of Statistics Inha University, Korea
Last
decade genome-wide association studies (GWAS) has been successfully
accomplished and we could find many significantly associated SNPs with
phenotypes of interest. However the multiple testing problem is still
intractable issues and it becomes more serious for next generation sequencing
analysis. In this manuscript, we investigated the analysis of longitudinal data
for GWAS. Because genotyping cost is often more expensive than phenotyping, the
longitudinal data analysis can be an alternative choice for multiple testing
problems. Here the linear mixed model has been applied to the phenotypes with
repeated observations in Korean Association REsource (KARE) project and
principle component analysis (PCA) has been conducted to adjust for population
stratification. We found that the power is proportional to the number of
repeated measurements and sample size while it is inversely proportional to the
correlation coefficient of repeated observations.
TBC-61: Differential
influences of common variants on erythrocyte-related traits according to Sasang
constitutional types
Seongwon
Cha1, Hyunjoo Yu1 and Jong Kim2
1Constitutional
Medicine & Diagnosis Research Group, 2Vice-President,
Korea Institute of Oriental Medicine (KIOM), Daejeon, 305-811, Korea
Hematological
disorders such as anemia and erythrocytosis characterized by measuring
erythrocyte-related traits are known to be associated with cardiometabolic
diseases. Genetic variants associated with hematological traits have been
elucidated in several genome-wide association studies (GWAS). In Sasang
constitutional medicine (a Korea-specific type of personalized medicine), human
beings are categorized into four types harbouring differential prevalence of
cardiometabolic diseases and anemia. In this study, we aimed to investigate
whether each constitutional type had differential genetic factors associated
with hematological traits. Therefore, we examined the effects of the variants
reported to be definitely associated with hematological traits from previous
GWAS researches on the same hematological traits according to Sasang
constitutional types. We performed multiple linear regression analyses with
measurements of RBC, Hb, Hct, MCV, MCH, MCHC, and RDW in two Korean
populations: 1,701 and 3,472 subjects recruited from the Korea Constitution
Multicenter Study and the Korea Genome and Epidemiology Study, respectively.
The Sasang constitutional types were categorized by the Sasang Constitutional
Analysis Tool: in total, 2,696 subjects with Taeum type, 1,881 subjects with
Soyang type, and 596 subjects with Soeum type. Among initially selected over 30
polymorphisms, we finally found 4 variants in 4 genetic loci (HBS1L-MYB, TMPRSS6, SPTA1, and ITFG3)
presenting association signals both in the two populations. Two variants of HBS1L-MYB and TMPRSS6 were associated with measurements of RBC, MCV, MCH, MCHC,
and/or RDW in total population and two sub-populations with Taeum and Soyang
types. The variant of SPTA1 was
associated with MCHC in total populations, and the ITFG3 variant was associated with Hb in a sub-population with Soeum
type. These results showed that the profile of variants associated with
hematological traits was different according to Sasang constitutional types,
especially between Soeum type and the others.
TBC-62: Comparing
algorithms for genotype imputations in family-based design
Kim
Youngdoe1, Lim Jungmin2, Li Donghe2, Lee
Jaemoon2 and Won Sungho2
1Division
of Structural and Functional Genomics, The Center for Genome Science, Korea
National Institute of Health, KCDC, Osong, Korea
2Department
of Applied Statistics, Chung-Ang University, Seoul, Korea
Genotype
imputation is now an essential tool in the analysis of genome-wide association
scans to handle the missing data, untyped genotypes, etc. However, even though
its importance, a few approaches have been proposed for the imputation of
genotype in family-based design, and the accuracy for each method has not been
confirmed. In this manuscript we compared several methods for genotype
imputations with Korean Healthy TWIN cohort. We compared IMPUTE2, BEAGLE, MACH
and GHOST, and the accuracy for each software has been calculated. In addition
we considered two-stage imputation algorithm. We, first, impute the genotypes
with Mendelian transmission and then haplotype-based imputation algorithm has
been conducted. Even though the difference between different software is small,
our results show that the two-stage algorithm performs slightly better.
TBC-63: A
large-scale genome-wide association study of Korean Family cohorts for genetic
variants influencing metabolic syndrome
Youngdoe
Kim1,2, Yong Ki Jung2, Sung Oh Kang2, Nam Hee
Kim1, Young Jin Kim1,Juyoung Lee1, Sungho Won2
1Division
of Structural and Functional Genomics, The Center for Genome Science, Korea
National Institute of Health,
KCDC, Osong, Korea
2Department
of Applied Statistics, Chung-Ang University, Seoul, Korea
To
identify genetic factors influencing several traits (height, body mass index
(BMI), triglycerides (TG), high density lipoprotein (HDL), low deinsity
lipoprotein (LDL), diastolic blood pressure (DBP) and systolic blood pressure
(SBP)) of metabolic syndrome (MetS), we conducted a genome-wide association study
(GWAS) with 1,801 samples from Korean Healthy Twin cohorts and 784 samples from
Ansung Family extended cohorts recruited in Korea. In particular we found that
the phenotypic distributions for TG were not normally distributed and thus they
were log-transformed for GWAS. The linear mixed model with the restricted
maximum likelihood (REML) method has been applied to find a significant
association. We found that two SNPs were significantly associated with log TG
at the genome-wide scale and both SNPs were replicated in the other cohort.
TBC-64: Ethical,
Legal, and Social Frameworks on Issues of Bioinformatics
Hannah
Kim1, Ilhak Lee1, Ji Yong Park1, Sang Hyun Kim2
and So Yoon Kim1,3
1Department
of Health Law and Bioethics, College of Medicine, Yonsei University, Korea
2Department
of Health Law and Bioethics, Graduate School of Public Health, Yonsei University,
Korea
3Centre
for ELSI Research, Asian Institute for Bioethics and Health Law, Yonsei University,
Seoul 120821, Korea
Fundamental
roles of bioinformatics are to identify the genes and cellular pathways
relating to diseases and to link them to the advanced clinical fields such as
prevention, diagnosis, and treatment of human diseases. Whereas this field
accelerates the progress of development and generalization, it raises various
ethical, legal, and social questions focusing on patients or research
participants.
Thus,
Centre for Ethical, Legal, and Social Issues Research (Centre for ELSI
Research) developed frameworks to investigate, analyse, and evaluate the
developed issues in the aspects of ethical, legal, and social context. The
frameworks are efficient not only to predict the effects of translational
bioinformatics and medicine so to make appropriate response or strategies, but
also multinational comparative studies. We expect the applicable range of the
frameworks is from bioinformatics to other cutting–edge biotechnology area.
Going
through the final stage of development of the framework, we are planning next
step. It is to address the implications for individuals and society, drawing
all prospective ethical, legal, and social issues on each sub-project, as well
as reviewing key issues through discussions with researchers and expert panels,
as our next step. This article will provide the introduction of the whole
schemes for refining them more.
TBC-65: PATH2:
Software for Conducing Gene-Ontology And Pathway Based Analyses using Genome-Wide
Association Data
Denise
Daley1, David Zamar1, Ben Tripp1, Brad
Cavanagh1 and George Ellis1
1University
of British Columbia, Canada
Most
genome-wide association (GWA) studies lack the power to detect single
nucleotide polymorphisms (SNPs) with small effects. However, the aggregate
effect of several SNPs working together within a pathway is more easily
detectable. Testing for pathway-based association is a promising approach in
identifying genes with small additive effects that work together to increase or
decrease susceptibility to common complex diseases. Perhaps the most important
role performed by pathway-based approaches is in the identification of
underlying biological mechanisms leading to disease. Although several
algorithms exist for conducting pathway-based analyses, not all of them have
been implemented for public usage. We have developed a software package that
implements several pathway-based methods and provides an easy to use interface
for conducting analyses. Source code and binaries are freely available for
download at http://genapha.icapture.ubc.ca/Path2. Our software is implemented
in Java, but makes use of both Perl and R and is supported on Linux and
Windows. To illustrate its usage, we perform an ontology-based and a
pathway-based analysis of the published results from the GABRIEL consortium
large-scale genome-wide association study of asthma.
TBC-66: Comparison
of Genetic Variations in Drug Metabolizing Enzyme and Transporter Genes among
Korean, Japanese, and Chinese Population
SoJeong
Yi1, Sangin Lee2, Youngjo Lee2, Seonghae Yoon1,
Inbum Chung1, HyeKyung Han1, Jae-Yong Chung1,
Ichiro Ieiri3 and In-Jin Jang1
1Department
of Clinical Pharmacology and Therapeutics, Seoul National University College of
Medicine and Hospital
2Department
of Statistics, Seoul National University, Seoul, 110-799, Korea
3Department
of Clinical Pharmacokinetics, Graduate School of Pharmaceutical Sciences,
Kyushu University, Fukuoka,
812-8582, Japan
Inter-ethnic
difference of genetic polymorphism in genes encoding drug-metabolizing enzymes
and drug transporters is one of major factors causing ethnic sensitivity for
drug response. In this study, the authors explored genetic differences among 3
major East Asian populations, Korean, Japanese, and Chinese in single
nucleotide polymorphisms (SNPs) on genes related with drug absorption, metabolism,
disposition, and transport.
Using
DMET® plus platform (Affymetrix, USA), the allele or
genotype frequencies
of 1,936 variants (1,931 SNPs and 5 copy number
variations) representing in 225 drug-metabolizing enzyme and transporter genes were determined
from 786 healthy
male participants
(448 Koreans, 208 Japanese, and 130 Chinese). To
compare allele or genotype frequencies among 3 ethnic groups in the high-dimensional
data, a principal component analysis (PCA) method and regularized multinomial
logit model, which is a multi-class classification procedure, were employed.
Of the 1,936
variants, 1,071 variants (55.3%) were monomorphic and 127 variants (6.6%) were
'no call', therefore, the rest 738 biallele variants were analysed. The result
of PCA showed that Korean, Japanese, and Chinese were not distinguished by
first few principal components. However, multinomial logit model via least
absolute shrinkage and selection operator (LASSO) could classify three ethnic
groups using a model with 105, 98 and 99 selected markers for Korean, Japanese,
and Chinese, respectively. The accuracy of prediction model was 87.9%, and misclassification
error rate was 12.1%. The most significant genetic variations were EPHX1_16466T>C
for Korean (coefficient= -1.24), CYP2A6_1799T>A for Japanese (coefficient =
2.45), and rs17064 on ABCB1 for Chinese (coefficient = 2.37).
In
conclusion, this comprehensive genetic variant assessment suggests that genetic
differences in genes encoding drug-metabolizing enzymes and drug transporters are
very small among Korean, Japanese, and Chinese.
|