The 7th Korea-Singapore Joint Workshop

Speakers

Chew Lim Tan (NUS, Singapore)

		Content-based Retrieval and Automated Interpretation of Brain CT Scan Images
		Due to the advances of multi-slice Computed Tomography (CT) Scan with up to 64 slices per scan, a huge amount of CT images are produced in modern hospitals every year. However, the current hospital image database does not allow retrieval of images other than with patient names or identity card number. Very often doctors already overloaded with day-to-day medical consultation simply could not remember patients’ names when they need to refer to similar cases seen before and as such valuable information are lost in the sea of raw image pixels. Interests in medical image retrieval have grown over the years. Current image retrieval techniques have largely been relying on the meta text data associated with the images. With the above background in mind, the present project aims to address two research problems: (1) To investigate techniques for fast retrieval of brain CT scan images based on the image content of the medical anomalies as well as other textual information associated with the medical conditions. Machine learning paradigms will be explored to enable automatic classification of medical images based on the image contents and textual data. (2) To investigate text/image mining techniques to do automatic interpretation of image contents by generating reports for the images. This is to emulate a neuroradiologist in performing prognosis of medical conditions from medical images and associated clinical data. The academic significance of the project lies in two innovations (1) development of novel techniques to retrieve images based on image content of certain medical anomalies, and (2) an intelligent system that is able to interpret the image content and produce prognosis based on the interpretation. The resultant image analysis and prognosis report will enable the formulation of a computer assisted decision support system for clinical application.
		Chew Lim Tan is an Associate Professor in the Department of Computer Science, School of Computing, National University of Singapore. He received his B.Sc. (Hons) degree in physics in 1971 from University of Singapore, his M.Sc. degree in radiation studies in 1973 from University of Surrey, UK, and his Ph.D. degree in computer science in 1986 from University of Virginia, U.S.A. His research interests include document image analysis, text and natural language processing, neural networks and genetic programming. He has published more than 300 research publications in these areas. He is an associate editor of Pattern Recognition, associate editor of Pattern Recognition Letters, an editorial member of the International Journal on Document Analysis and Recognition. He was a guest editor of the special issue on PAKDD 2006 of the International Journal of Data Warehousing & Mining, April-June 2007. He has served in the program committees of many international conferences. He is the current President of the Pattern Recognition and Machine Intelligence Association (PREMIA) in Singapore. He is a member of the Governing Board of the International Association of Pattern Recognition (IAPR). He is also a senior member of IEEE.

Hyunju Lee (GIST, Korea)

		Integrative Analysis of DNA Copy Number Aberration and Gene Expression Changes in Cancer
		DNA copy number aberrations (CNAs) and gene expression (GE) changes provide valuable information for studying chromosomal instability and its consequences in cancer. While it is clear that the structural aberrations and the transcript levels are intertwined, their relationship is more complex and subtle than initially suspected. In this talk, I will present our recent study to illustrate how we investigate the correlation of each CNA to all other genes in the genome. The correlations are computed over multiple patients that have both expression and copy number measurements in brain, bladder, and breast cancer data sets. For each of the three cancer types examined, the aberrations in several loci are associated with cancer-type specific biological pathways that have been described in the literature. In all three data sets, gene sets related to cell cycle/division such as M phase, DNA replication, and cell division were also associated with CNAs. Our results suggest that CNAs are both directly and indirectly correlated with changes in expression and that it is beneficial to examine the indirect effects of CNAs.
		Hyunju Lee is a full-time Instructor in the Department of Information and Communications, Gwangju Institute of Science and Technology. She received her B.Sc. degree in Computer Science in 1997 from KAIST, her M.Sc. degree in Computer Engineering in 1999 from Seoul National University, and her Ph.D. degree in Computer Science in 2006 from University of Southern California, U.S.A. Her research interests include computational biology, visualization of biological datasets, data mining in network structure and machine learning.

Limsoon Wong (NUS, Singapore)

		Two Application of Text Mining in Bioinformatics: Enhancing Protein Function Prediction and Enhancing Drug Pathway Inference
		Protein function prediction and drug pathway inference are two important bioinformatics applications. Protein function inference has traditionally been accomplished primarily using "guilt by association" of sequence similarity. However, if good sequence similarity is unavailable, one must appeal to guilt by association of other types of similarity and even to combination of multiple types of similarity information. Drug pathway inference is one of the most complicated challenges in translational medical research. The analysis of gene expression data contrasting responses to drug treatment is an important approach to this problem. However, this approach has been plagued by many problems such as insufficient samples, high experimental noise, lack of direct measurement, etc. In this talk, we illustrate how information extracted from biological literature can significantly enhanced existing solutions to these two important challenges. For the protein function inference problem, we present a framework for the fusion of multiple types of similarity information, and demonstrate the effectiveness of simple co-occurrences of protein names in MEDLINE abstracts and sentences as a form of similarity information. For the drug pathway inference problem, we describe a statistical framework for testing the consistency of the measured gene expression profile with the expected gene expression profile based on signaling pathways that can be extracted from pathway databases and literature, and demonstrate its advantages over current state of art.
		Limsoon Wong is a professor in the School of Computing and the School of Medicine at the National University of Singapore. Before that, he was the Deputy Executive Director for Research at A*STAR's Institute for Infocomm Research. He is currently working mostly on knowledge discovery technologies and is especially interested in their application to biomedicine. Prior to that, he has done significant research in database query language theory and finite model theory, as well as significant development work in broad-scale data integration systems. Limsoon has written about 100 research papers, a few of which are among the best cited of their respective fields. In recognition for his contributions to these fields, he has received several awards, the most recent being the 2003 FEER Asian Innovation Gold Award for his work on treatment optimization of childhood leukemias. He serves on the editorial boards of Journal of Bioinformatics and Computational Biology (ICP), Bioinformatics (OUP), and Drug Discovery Today (Elsevier).

Jian Su (I2R, Singapore)

		An Effective method of using Web based information for Relation Extraction
		We propose a method that incorporates paraphrase information from the Web to boost the performance of a supervised relation extraction system. Contextual information is extracted from the Web using a semi-supervised process, and summarized by skip-bigram overlap measures over the entire extract. This allows the capture of local contextual information as well as more distant associations. We observe a statistically significant boost in relation extraction performance. We investigate two extensions, thematic clustering and hypernym expansion. In tandem with thematic clustering to reduce noise in our paraphrase extraction, we attempt to increase the coverage of our search for paraphrases using hypernym expansion. Evaluation of our method on the ACE 2004 corpus shows that it out-performs the baseline SVM-based supervised learning algorithm across almost all major ACE relation types, by a margin of up to 31%.
		Jian Su received her B.Sc degree in Electronics from Sichuan University, China in 1990, M.Sc and PhD degrees in electrical & electronic engineering from South China University of Technology in 1993 and 1996 respectively. She was a Research Assistant from 1994 to 1995 at City University of Hong Kong, and an intern student at Centre de Recherche en Informatique de Nancy, France in 1995. She joined Institute for Infocomm Research (I2R), formerly known as Institute of System Sciences, where she established herself in the areas of Information Extraction, Coreference Resolution, (Bio)Text Mining. Dr Su has published intensively in natural language processing (NLP) and bioinformatics conferences and journals, including 11 papers in ACL Annual Meetings, and one journal article in Computational Linguistics in recent years. Dr Su is active in professional services for the computational linguistics community. She has served as Editor / Member of Editorial Board for two international journals. She was publication chair of ACL 2007 and IJCNLP 2005, technical program chair of LBM 2007, and PC members of numerous NLP conferences including ACL, IJCNLP, COLING, and EMNLP. Dr Su also led her team to achieve top performances in various information extraction / text mining benchmarking such as BioCreAtIve. She was responsible for the effort to establish the largest co-reference annotation corpus that has 2000 bio-abstracts and 24 biomedical full papers from GENIA collection. She has been the Principal Investigator in multiple technology deployments including BioMedical Information Management, Homeland Security Intelligence Gathering, Legal / Standard Enforcement, Business Intelligence Gathering.

Gwan-Su Yi (ICU, Korea)

		Unrestricted Identification of Post-Translational Modifications with Generating Mass Shift Set Using Tandem MS
		Post-Translational Modification (PTM) identification remains a complex process in proteome analysis. Restrictive search algorithms that consider only a small number of PTMs are widely used though they cannot identify unexpected PTMs. To cover unexpected modifications, unrestrictive search algorithms have been introduced, but they are limited in terms of the shift mass range or when using prior information such as the precursor mass or the PTM frequency matrix. An improved unrestrictive PTM search algorithm that enables the identification and sequence mapping of PTMs of candidate proteins or peptides found by protein identification algorithms is developed in this study. The proposed algorithm calculates all possible mass shifts and generates mass shift lists. The scores of the mass shift lists are calculated and PTMs are identified with their sequence location. Found PTMs can be validated by PTM databases or PTM literature mining system. Existing PTM databases cannot cover all the recent information because it requires manual updates by experts. Literature mining system which extracts PTM information from the biomedical literature can cover this problem. With this algorithm, it is possible to consider all possible mass shifts without restriction and detect previously unidentified PTMs in the MS/MS spectra.
		Gwan-Su Yi received a b.s. (1988) Molecular Bioloby at the Seoul National University in Seoul, Korea, and an M.S.E. (1990) and a Ph.D. (1993) in Biological Engineering at KAIST in Dajeon, Korea. His main interests are in Bioinformatics, Computational Systems Biology, Medical Informatics, and Structural bioinformatics. He has been a PostDoc in Korea Research Institute of Bioscience, Korea Basic Science Institute in Korea UNC (Chapel Hill, USA) and University of Toronto (Canada). Also he was a Senior scientist in Integrative Proteomics (Canada). He is an Associate Professor in the Information and Communications University (ICU).

Jinah Park (ICU, Korea; with Yunkyu Choi)

		Visualization Tool for Protein-Protein Interaction Network Analysis with Gene Ontology
		We have previously developed a 3D layout algorithm for Protein-Protein Interaction (PPI) network in conjunction with a Gene Ontology (GO) tree in order to address the correlation of a distance measure and its possible relation to PPI which is not yet determined. Although our experiments demonstrate the usability of the distance measure, the 3D visualization results themselves were too complex to invoke human user’s intuition or insights. Therefore, we break down the complex 3D layout into a multi-view of 2D layouts for GO and PPI side-by-side, and provide appropriate functionality for the exploration of PPI data for knowledge discovery. As for the graphical display of GO, we implemented layered graph drawing using a layer-by-layer sweep algorithm to reduce edge crossing. GO terms can be searched by a keyword or ID numbers, and the selected GO terms are displayed graphically in their hierarchically structured layout. If multiple GO terms are selected, the system automatically identifies the least common ancestors. As for PPI network visualization, a force-directed method (FDM) was employed with various selection mechanisms. Since the FDM generally yields an aesthetically pleasing layout, it is the popular choice for drawing the PPI network on a screen. One shortcoming of FDM, however, is that the layout is unpredictable in that even with a small change in database (i.e., introducing a new node or a new edge) the layout may look totally different. In order to improve the predictability and to maintain a stable layout, we associate related GO terms and their hierarchical information as force fields. Grouping of proteins based on their related GO terms is also used to analyze the PPI data. We also adapted a context-based browsing method to handle large-size relation data visualization.
		Jinah Park received a b.s. (1988) in Electrical Engineering at the Columbia University in New York, and an M.S.E. (1991) and a Ph.D. (1996) in Computer and Information Science at the University of Pennsylvania in Philadelphia. Her main interests are in Computer Graphics and Computer Vision as applied to Medical Image Analysis and Visualization. Her major contributions to the field include developing a computational technique to analyze cardiac motion based on MR tagging data. Upon coming to Korea, she worked at EE&CS Department of Korea Advanced Institute of Technology and Science (KAIST) as a research professor. She joined Information and Communications University (ICU) in 2002 as a regular faculty member. Check out her research lab (CGV lab) at ICU.