Biomedical Informatics Grand Round

ÇöÀç Âü¿©ÇÏ´Â ¿¬±¸½ÇÀº ´ÙÀ½°ú °°½À´Ï´Ù.
EWUBI ÀÌÈ­¿©ÀÚ´ëÇб³ ºÐÀÚ»ý¸í°úÇкΠÀÌ»óÇõ±³¼ö´Ô ¿¬±¸½Ç
ÀÌÈ­¿©ÀÚ´ëÇб³ ÄÄÇ»ÅÍÇаú ¹ÚÇö¼®±³¼ö´Ô ¿¬±¸½Ç
SABB ¼­¿ï´ëÇб³ ½ÄÇ°,µ¿¹° »ý¸í°øÇкΠ±èÈñ¹ß±³¼ö´Ô ¿¬±¸½Ç
SNUBI¼­¿ï´ëÇб³ ÀÇ°ú´ëÇÐ ±èÁÖÇѱ³¼ö´Ô ¿¬±¸½Ç
¹®ÀÇ»çÇ×Àº ±èÁÖÇѱ³¼ö´Ô ¿¬±¸½ÇÀÇ ±èµµ±Õ (dkkim@snu.ac.kr 02-740-8319)¿¡°Ô ¹®ÀÇÇØÁֽʽÿä.

³× ¹ø° ¼¼¹Ì³ª´Â ¼­¿ï´ëÇб³ °ü¾ÇÄ·ÆÛ½º ³ó»ý´ë ´ëȸÀÇ½Ç 200µ¿ 3016È£ ¿¡¼­ 2007³â 4¿ù 21ÀÏ ¿ÀÀü 9½Ã¿¡ ½ÃÀÛÇÕ´Ï´Ù.

 

¡¡

Affiliation Presenter Abstract

SNUBI

Mi Ryung Han

Protein classification from protein-domain and gene-ontology annotation information using formal concept analysis

There are a number of different attributes to describe ontology of proteins such as protein structure, biomolecular interaction, cellular location, and protein domains which represent the basic evolutionary units that form protein. In this paper, we propose a mathematical approach, formal concept analysis (FCA), which toward abstracting from attribute-based object descriptions. Based on this theory, we present extended version of algorithm, tripartite lattice, to compute a concept lattice. By analyzing tripartite lattice, we attempt to extract proteins, which are related to domains and gene ontology (GO) terms from bottom nodes to the top of lattice. In summary, using tripartite lattices, we classified proteins from protein domain composition with their describing gene ontology (GO) terms. ¡¡

BOIPOP Kyugn Mo Kim Molecular Evolution and Phylogenetic Potential of Lanosterol Synthase in Animals and Fungi

Lanosterol synthase is strongly related to the fluidity and ion permeability of cell membranes and the metabolism of steroid hormones. The absence of this enzyme can lead to no production of cholesterol and ergosterol, which is fatal to cell viability in animals and fungi. In terms of evolution, lanosterol synthase is the most recent common ancestor in the biosynthetic pathways related to cholesterol and ergosterol. Of 255 homologous sequences retrieved from public databases, we identified 25 orthologs of lanosterol synthase. The phylogenetic relationships of lanosterol synthase were almost completely congruent with the existing species divergence. The statistical tests, including maximum likelihood analyses of codon-based models, showed that negative selection has affected on the evolution of lanosterol synthase, indicating that the molecule has been under strong functional constraints. The results of the TLD and PTP tests showed that lanosterol synthase has a strong phylogenetic signal. Additionally, our novel combined test of bootstrapping and PHT revealed that the lanosterol synthase gene is highly compatible with the small subunit sequences of ribosomal DNA, indicating that the gene can be a good partner with the rDNA marker for phylogenetic studies of animals and fungi.

EWUBI

Youngah Shin

DEGASEST – a database of differentially expressed genes and alternative splicing using EST information

Differentially expressed genes (DEG) are valuable resources for various biological and medical applications. DEGASEST allows the user to explore differentially expressed genes, transcripts (isoforms), and alternative splicing (AS) events based on EST information for human and mouse. Over 8,600 cDNA libraries were manually classified into 52 tissue/organ and cancer types for human, while over 1,100 cDNA libraries were classified into 36 tissue/organ, developmental stage and cancer types for mouse. Specific expression in any tissue and/or cancer type is inferred from statistical testing of EST clusters at three levels - gene, transcript, and splicing events. ECgene's genome-based EST clustering was used to assess the gene level expression, which is quite similar to the UniGene. Additionally, DEGASEST predicts the isoform level expression using ECgene's assembly and sub-clusters. Transcripts may be differentially regulated at the isoform level even though the gene itself has no specific expression pattern. Furthermore, DEGASEST includes the differentially regulated AS events such as exon-skipping, alternative donor/acceptor sites, and intron retention. Genome-wide search result was stored in a relational database and a user-friendly web interface is provided to support various types of queries.




¼¼¹ø° ¼¼¹Ì³ª´Â ¼­¿ï´ëÇб³ ÀÇ°ú´ëÇÐ ÀǴ뺻°ü 308È£ ¿¡¼­ 2006³â 10¿ù 21ÀÏ ¿ÀÀü 9½Ã¿¡ ½ÃÀÛÇÕ´Ï´Ù.

 

¡¡

Affiliation Presenter Abstract

SNUBI

Mingoo Kim

Extracting Regulatory Modules from Heterogeneous Gene Expression Data by Sequential Pattern Mining

Motivation: Identifying a regulatory module (RM), a bi-set of coregulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. In our approach, the co-regulated genes are identified as a sequential pattern, resulting from sequential pattern mining on microarray data. The co-regulating conditions are identified as the corresponding samples to the genes. In order to fit the algorithm into biological implication at hand, the conventional definition for sequential pattern is relaxed by allowing trivial switch between consecutive elements in a sequence. The searching method is also modified to enhance flexibility and scalability. The modified method enables the algorithm to run for huge-sized microarray data and to finish in a reasonable time. The proposed algorithm is of great benefit when RM are identified from a large-scale gene expression matrix with heterogeneous conditions.
Results: The resulting RMs are significantly well enriched to known annotations (about genes and conditions as well), and are consistent with known biological knowledge. In addition, the types of relations between RMs are further investigated; they are categorized into one of four types? independent, conditionally co-regulated, separately coregulated, and similar, based on the degree of overlap between two modules. The respective types of inter-module relations are exemplified with biological inferences via enrichment study. ¡¡

SABB Jongeun Park Evolutionary characterization of the proteins containing KRAB-Zinc finger domains

In the previous study (Kim et al., 2006), we have shown that the proteins containing KRAB-Zinc finger domains are likely to be responsible for the lineage specific function in mammals. Using the non redundant sets of proteins, the international protein index(IPI), developed by the EMBL, Pfam domain search were conducted to extract the proteins containing the KRAB-Zinc finger domains in five species; human, mouse, rat, chicken and Zebrafish. The number of the KRAB-Zinc finger proteins were much higher in mammals; human (312), mouse (430) and rat (486), that that of non mammalian vertebrate; chicken (12) and Zebrafish (0). In order to elucidate the evolutionary relationship of the proteins, phylogenetic analyses were conducted within species with homologous sequences, and between species with the orthologous sequences. Here, we suggest the proteins containing KRAB-Zinc finger domains seem to be mainly expanded after the evolutionary branching point of the mammalian and non mammalian vertebrate. ¡¡
EWUBI Bumjin Kim Improved tag-to-gene assignment for reliable interpretation of SAGE data

Serial Analysis of Gene Expression (SAGE) is a tag-based method of probing gene expression at the genome-wide level. Reliable tag-to-gene assignment is essential but often complicated due to many factors such as (i) sequencing errors, (ii) tag redundancy owing to short tag length (10bp in short SAGE, 21bp in long SAGE), (iii) interanl priming (use of alternative restriction sites), (iv) alternative polyA tails, and (v) presence of SNP in the restriction enzyme site or inside the tags. Conventainal procedure uses the tags extracted from the mRNA and EST sequences in a UniGene cluster without addressing those problems. We developed a computational pipeline that took alternate tags and experimental problems into consideration. First, we created the ¡®virtual¡¯ tag libraries from various gene models that included the RefSeq and the ECgene models of splice variants. Second, we created the ¡®observed¡¯ tag library after removing errorneous tags due to sequencing errors using a Monte Carlo simulation. Resulting observed tag library was compared with the virtual tag libaraies at varous confidence level. ECgene model of splice variants takes the alternative polyA tails into consideration. Alternative tags arising from SNP and internal priming were deduced.
¡¡



µÎ¹ø° ¼¼¹Ì³ª´Â ÀÌÈ­¿©ÀÚ´ëÇб³ Á¾ÇÕ°úÇаü Cµ¿ B101È£¿¡¼­ 2006³â 2¿ù 28ÀÏ ¿ÀÈÄ 4½Ã¿¡ ½ÃÀÛÇÕ´Ï´Ù. ÀÌÈ­¿©´ë·Î °¡´Â ¾àµµ¿Í ±³ÅëÆíÀÇ ¸µÅ©ÀÔ´Ï´Ù. ÀÌÈ­¿©´ë Ä·Åͽº ³»¿¡¼­ Á¾ÇÕ°úÇаüÀÇ À§Ä¡´Â ´ÙÀ½ ¾àµµ¸¦ ÂüÁ¶ÇϽʽÿä.

AffiliationPresenterAbstract
SABBÀÓ´ÙÁ¤GOBias: A significance test of the spatial bias of genes in a gene ontology term
GOBias is a web tool for testing the statistical significance of the chromosomal spatial bias of genes in a gene ontology (GO) term versus random chance. The distributions of the random chances of each node were drawn using 10,000 bootstraps. Currently, GOBias describes five species, including human, mouse, rat, chicken and zebrafish. The user can find the bootstrapping distribution and significance value of any GO term for these species. GOBias also visualizes the genomic distribution of genes in a GO term, and provides a test of significance for the clustering using a query protein list for the five species.>
EWUBIÀÌ¿µÈñASviewer: Visualizing the transcript structure and functional domains of alternatively-spliced genes
Alternative splicing (AS) produces diverse transcript structures by differential use of splice sites. Comparing the gene structure and functional domains of splice variants is an essential but nontrivial task with numerous gene predictions available publicly. We developed a novel viewer (ASviewer) that visualized the transcript structure and functional inference of alternatively spliced genes intuitively. Key ideas involve clustering of overlapping exons and representing introns in arbitrary scales. Using the representative exons in the master coordinate facilitates comparison of transcript structure of many isoforms. The most distinctive feature of the viewer is that it can be the genome browser or the transcript viewer by arbitrary intron scaling. Intron scale of 100% makes the view equivalent to the genome browser that is most convenient to specify genomic features. ASviewer at the intron scale of 0% shows transcripts in the mRNA (exon) coordinate that is suitable to depict features in mRNA sequences such as functional domains. Therefore, arbitrary intron scaling makes it possible to combine advantages of genome browser and transcript viewer into a single viewer. Current java implementation supports five well-known gene predictions (RefSeq, Ensembl, AceView, CCDS and ECgene) as well as uploading user sequences and features in various formats. ASviewer is available at http://genome.ewha.ac.kr/ASviewer. [doc]
EWUBIÀÌ¿µÈñGenome-wide survey of domain changes due to alternative pre-mRNA splicing
Alternative splicing (AS) is an important mechanism of increasing proteome diversity. Domain changes due to AS events have a direct effect on molecular function of the gene, and many examples of functional changes are reported in terms of cell communication, signaling, development and apoptosis. Some splice variants are known to carry out even the opposite function. In an effort to elucidate the functional role of alternative splicing, we performed a genome-wide analysis of domain changes due to alternative pre-mRNA splicing using ECgene model. ECgene provides one of the most complete catalogs of splice variants. We calculated the PFAM domains for all ECgene transcripts and classified the type of alternative splicing - exon skipping, donor/acceptor site variation, alternative initial/terminal transcription. The origin of changes in functional domains was analyzed in terms of the AS types and frame shifts. We find that a substantial portion of domain changes arise from the frame shift, not from skipping exons with functional domains. Furthermore, the correlation with normal/cancer phenotypes is explored by inspecting the EST sequences consistent with each isoform structure. The result would be valuable to examine the phenotypic consequences of domain changes due to AS events. [doc]
SNUBIÀÌÇý¿øThe Tissue Microarray Object Model: a data model for storage, analysis and exchange of tissue microarray experimental data
Tissue microarray (TMA) is an array-based technology allowing the examination of hundreds of tissue samples on a single slide. To handle, exchange, and disseminate TMA data, we need standard representations of the methods used, of the data generated, and of the clinical and histopathological information related to TMA data analysis. This study aims to create a comprehensive data model with flexibility that supports diverse experimental designs and with expressivity and extensibility that enables an adequate and comprehensive description of new clinical and histopathological data elements. We designed a Tissue Microarray Object Model (TMA-OM). Both the Array Information and the Experimental Procedure models are created by referring to Microarray Gene Expression Object Model, Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE), and the TMA Data Exchange Specifications (TMA DES). The Clinical and Histopathological Information model is created by using CAP Cancer Protocols and National Cancer Institute Common Data Elements (NCI CDEs). MGED Ontology, UMLS and the terms extracted from CAP Cancer Protocols and NCI CDEs are used to create a controlled vocabulary for unambiguous annotation. We implemented a web-based application for TMA-OM, supporting data export in XML format conforming to the TMA DES or the DTD derived from TMA-OM. TMA-OM provides a comprehensive data model for storage, analysis and exchange of TMA data and facilitates model-level integration of other biological models. Availability: Xperanto-TMA is available at http://xperanto.snubi.org/TMA/. [doc]






ù¹ø° ¼¼¹Ì³ª´Â ¼­¿ï´ëÇб³ °ü¾ÇÄ·ÆÛ½º ³ó»ý´ë ´ëȸÀÇ½Ç 200µ¿ 3016È£¿¡¼­ 2005³â 11¿ù 19ÀÏ ¿ÀÀü 9½Ã¿¡ ½ÃÀÛÇÕ´Ï´Ù.

½ºÄÉÁìÀº 9½ÃºÎÅÍ 11½Ã±îÁö ¹ßÇ¥, ±× ÈÄ Á·±¸½ÃÇÕ, Á¡½É ¼øÀ¸·Î ÁøÇàµË´Ï´Ù.

AffiliationPresenterAbstract
EWUBI³²½ÂÀ±Transcriptional regulatory network
Biological networks are the representation of multiple interactions within a cell. Recent advances in molecular and computational biology have made possible the study of intricate transcriptional regulatory networks that describe gene expression as a function of regulatory inputs specified by interactions between proteins and DNA. Here we have developed an approach to identify genome-wide transcriptional binding sites by using an knowledge-based transcriptional binding factor database, TRANSFAC¢ç Professional 8.3 . The approach is combined with comparative genomics among multiple species to find the evolutionally conserved binding sites. The present study concentrates on searching the transcriptional binding factor pairs in the neighborhood and defining the statistical boundary for neighborhood measurement.
SNUBIÁ¤ÈñÁØArrayXPath
ArrayXPath (http://www.snuib.org/software/ArrayXPath) is a web-based service for mapping and visualizing microarray gene-expression data with integrated biological pathway resources using Scalable Vector Graphics (SVG). Deciphering the crosstalk among and integrating biomedical ontologies and knowledge bases may help biological interpretation of microarray data. ArrayXPath is empowered by integrating gene-pathway, disease-pathway, drug-pathway, and pathway-pathway correlations with integrated Gene Ontology (GO), Medical Subject Headings (MeSH), and OMIM Morbid Map-based annotations. We applied Fisher¡¯s exact test and relative risk to evaluate the statistical significance of the correlations. ArrayXPath produces Javascript-enabled SVGs for web-enabled interactive visualization of gene expression profiles integrated with gene-pathway-disease interactions enriched by biomedical ontologies.
SABB¹®¼±Áø¹ÌÅäÄܵ帮¾Æ DNA¿Í ¹ÌÅä-ÇÁ·ÎÅ×ÀÎÀÇ ÁøÈ­
¼¼Æ÷ÀÇ ¡®ÆÄ¿öÇ÷£Æ®¡¯¶ó ºÒ¸®±âµµ ÇÏ´Â ¹ÌÅäÄܵ帮¾Æ´Â ¼¼Æ÷ÀÇ ¿¡³ÊÁö ´ë»ç¸¦ À§ÇØ ÇʼöÀûÀÎ ¼¼Æ÷³» ¼Ò±â°üÀÌ´Ù. ¹ÌÅäÄܵ帮¾ÆÀÇ ±â´É¿¡ ÀÌ»óÀÌ »ý±â¸é ´ë»ç¼ºÁúº´¿¡¼­ºÎÅÍ ATPµî ¿¡³ÊÁöºÎÁ·À¸·Î À¯ÀüÀÚ Áß Æ¯È÷, ±ä ÆéŸÀ̵带 ¸¸µå´Â À¯ÀüÀÚÀÇ ¹ßÇö¿¡ Å« ÁöÀåÀ» ÃÊ·¡ÇÑ´Ù. Áï, ±ÙÀ°À» ±¸¼ºÇÏ´Â ´Ü¹éÁú ¹ßÇö¿¡ ÀÌ»óÀÌ »ý±â´Â °æ¿ì°¡ ¸¹´Ù. ¶§¹®¿¡ ¹ÌÅäÄܵ帮¾Æ´Â »ý¹°Ã¼°¡ ȯ°æ¿¡ ÀûÀÀÇÏ´Â µ¥ °¡Àå Áß¿äÇÑ ¿ä¼Ò°¡ µÇ¸ç, ´Ù¸¥ ¾î¶² °Íº¸´Ù ÁøÈ­ÀûÀÎ ÈûÀ» Å©°Ô ¹Þ´Â´Ù. Àΰ£ÀÇ ¹ÌÅäÄܵ帮¾Æ DNA´Â µ¶¸³ÀûÀÎ À¯Àü ½Ã½ºÅÛÀ» °¡Áö°í ÀÖÀ¸¸ç, ´Ü¹éÁúÀ» ¸¸µå´Â À¯ÀüÀÚ¸¦ 13°³ °¡Áö°í ÀÖ´Ù. ÁøÈ­ÀûÀ¸·Î endosymbiosis¿¡ ÀÇÇؼ­ ¼¼Æ÷¼Ò±â°üÀ¸·Î ÁøÈ­ÇÑ °ÍÀ̳ª, °¡Àå´Ü¼øÇÑ ´Ü¼¼Æ÷ ¹× ¼¼±ÕÀÌ °¡Áø À¯ÀüÀÚÀÇ °³¼öº¸´Ù ÈξÀ ÀÛÀº ¼öÀÌ´Ù. ±×·¯³ª Àΰ£ °Ô³ð¿¡¼­ õ °³ ÀÌ»óÀÇ ´Ü¹éÁú(¹ÌÅä-ÇÁ·ÎÅ×ÀÎ)ÀÌ ¹ßÇöµÇ¾î¼­ ¹ÌÅäÄܵ帮¾Æ·Î µé¾î°¡ ¹ÌÅäÄܵ帮¾Æ°¡ Á¦ ±â´ÉÀ» ÇÒ ¼ö ÀÖ´Ù. ÀÌ´Â ¼ö ¾ï ³âÀü ÀÚÀ¯·Ó°Ô »ì´ø ¹ÚÅ׸®¾Æ¿´´ø ¹ÌÅäÄܵ帮¾Æ°¡ ÁøÇÙ ¼¼Æ÷¼ÓÀ¸·Î µé¾î¿Ô°í, ±× ÈÄ ¹ÌÅäÄܵ帮¾Æ°¡ Áö´Ï°í ÀÖ´ø À¯ÀüÀÚ¸¦ (¾î¶² ÁøÈ­ÀÇ ÈûÀÌ) Çϳª¾¿ ÇÙÀÇ °Ô³ðÀ¸·Î À̵¿½ÃÄÑ¿Â ¶§¹®À¸·Î ÇöÀç´Â º¸°í ÀÖ´Ù.
ÇöÀç ¹Ì±¹»ý¹°Á¤º¸¼¾ÅÍ(NCBI)¿¡ 1000¿© Á¾¿¡ ´ëÇØ Àüü ¹ÌÅäÄܵ帮¾Æ DNA ¼­¿­ÀÌ µîÀçµÇ¾î ÀÖ´Ù. ¶ÇÇÑ, ¹ÌÅä-ÇÁ·ÎÅ×Àο¡ ´ëÇÑ À¯ÀüÀÚ´Â Àΰ£ÀÇ °æ¿ì 800¿© °³°¡ µîÀçµÇ¾î ÀÖÀ¸¸ç, È¿¸ð, »ýÁã, ½Ä¹°ÀÇ ¹ÌÅä-ÇÁ·ÎÅ×ÀÎÀº 300¿¡¼­ 600 °³Á¤µµÀÇ ´Ü¹éÁú ¼­¿­ÀÌ µîÀçµÇ¾î ÀÖ´Ù. ÀÌµé ¹ÌÅäÄܵ帮¾Æ DNA ¹× ¹ÌÅä-ÇÁ·ÎÅ×ÀÎÀ» ÁøÈ­Àû °üÁ¡¿¡¼­ ºÐ¼®ÇÏ´Â °ÍÀº ¹ÌÅäÄܵ帮¾ÆÀÇ ÁøÈ­»Ó¸¸ ¾Æ´Ï¶ó, °£Á¢ÀûÀ¸·Î »ý¹°Ã¼°¡ ȯ°æ¿¡ ´ëÇÑ ÀûÀÀÇØ¿Â ¹æ¹ý¿¡ ´ëÇÑ °£Á¢ÀûÀÎ Áõ°Å·Î »ç¿ëµÉ ¼ö ÀÖ´Ù. ¶ÇÇÑ, ¹ÌÅä-ÇÁ·ÎÅ×ÀÎÀÇ °æ¿ì, endosymbiosisÀÇ °úÁ¤À» ÅëÇØ ÇÙÀÇ DNA·Î À̵¿Ç߱⠶§¹®¿¡ ±âº»ÀûÀ¸·Î intronÀÌ ¾ø´Â »óÅ¿¡¼­ DNAÀÇ »îÀ» ½ÃÀÛÇÏ¿´´Ù. ¹ÌÅä-ÇÁ·ÎÅ×ÀÎ À¯ÀüÀÚ´Â ÁøÇÙ¼¼Æ÷°¡ ÁøÇÙ¼¼Æ÷¶ó´Â º¹ÀâÇÑ ±â´ÉÀ» °¡Áú ¼ö ÀÖ°Ô µÈ ¿ø·ù·Î½á duplication ¸ÅÄ«´ÏÁòÀÇ ÇÑ Á¾·ùÀÎ retroposonÀ̱⵵ ÇÏ´Ù. µû¶ó¼­ Àΰ£-ħÆÒÁö-»ýÁã-´ßÀÇ ¹ÌÅä-ÇÁ·ÎÅ×ÀÎÀÇ ÁøÈ­Àû º¯È­ °úÁ¤À» ÃßÀûÇÏ´Â °ÍÀ¸·Î exon, intron ¹× UTR µî°ú °°Àº À¯ÀüÀÚ ±¸Á¶°¡ Çü¼º°úÁ¤À» ¹àÈ÷´Â µ¥ ºûÀ» ºñÃâ ¼ö ÀÖÀ» °ÍÀÌ´Ù.
EWUBI±èº¸¶óChimerDB
Chromosome translocation and gene fusion are frequent events in the human genome and are often the cause of many types of tumor. ChimerDB is the database of fusion sequences encompassing bioinformatics analysis of mRNA and expressed sequence tag (EST) sequences in the GenBank, manual collection of literature dataandintegration with otherknown database such as OMIM. Our bioinformatics analysis identifies the fusion transcripts that have nonoverlapping alignments at multiple genomic loci. Fusion events at exon?exon borders are selected to filter out the cloning artifacts in cDNA library preparation. The result is classified into two groups?genuine chromosome translocation and fusion betweenneighboring genes owing to intergenic splicing. We also integrated manually collected literature and OMIM data for chromosome translocation as an aid to assess the validity of each fusion event. The database is available at http://genome.ewha.ac.kr/ ChimerDB/ for human, mouse and rat genomes.