微信里点“发现”,扫一下
二维码便可将本文分享至朋友圈
演讲摘要:Aligning multiple biological sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. MSA also has many applications in Next-Generation Sequencing (NGS) data analysis such aligning multiple short reads. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this talk, I will describe a new scoring method for use in biological multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our three new multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge and sequence consistency to produce biological meaningful sequence alignments. The sequence knowledge-based algorithm utilizes the existing biological sequence knowledge databases such as Swiss-Prot to guide sequence alignment. When sequence knowledge databases are not available, the sequence consistency-based algorithm can utilize the consistency information from the input sequence to achieve a similar effect. Experimental results and theoretical analysis indicate that our new scoring function and alignment algorithms truly improve the current best multiple sequence alignment algorithms.
讲者简介:潘毅教授以江苏省理科状元考入清华大学计算机科学与工程系,1982年和1984年分别获得清华大学工学学士学位和硕士学位,1991年获得美国匹兹堡大学计算机科学博士学位。目前担任中国科学院深圳理工大学(筹)计算机科学与控制工程学院院长、讲席教授,并且是美国乔治亚州立大学终身州校董荣誉退休教授。他曾是美国乔治亚州立大学计算机科学系主任、生物系主任、文理学院副院长、州校董教授、校级杰出教授,并担任过清华大学、北京大学、浙江大学等高校访问讲席教授或客座教授。潘毅教授是美国医学与生物工程院院士、英国皇家公共卫生学院院士、乌克兰工程院院士、英国工程技术学会会士、日本学术振兴会会士和长江学者讲座教授。潘毅教授的主要研究领域是以云计算、大数据分析、人工智能、深度学习等为工具,进行生物信息和医疗信息的研究。在此领域已发表250多篇SCI期刊学术论文,其中100多篇发表在顶尖的IEEE/ACM Transactions/Journals学术期刊上;另在国际学术会议录上发表150多篇学术论文,出版编著了40多本书。他的学术成果已被引用16300,目前他的H-index是82。他曾获得IEEE杰出成就奖,IEEE杰出服务奖,IEEE Transactions 最佳论文奖,多次获IEEE等国际大会最佳论文奖,四次获得IBM教授奖,两次获得日本学术振兴会高级邀请奖,安得鲁·梅隆奖等奖项。他应邀在60多个国际大会上作了大会主题演讲,并在美国和许多世界著名大学作了近百个学术报告。潘毅教授目前是《Big Data Mining and Analytics》(清华大学与IEEE共同发行)的主编, 《IEEE/ACM Transactions on Computational Biology and Bioinformatics》和中国顶尖计算机英文杂志《Journal of Computer Science and Technology (JCST)》的副总主编,曾是John-Wiley《生物信息学系列丛书》与John-Wiley《无线网络和移动计算系列丛书》的创办人兼主编。他担任或担任过七种IEEE Transactions期刊副编辑以及十多种国际期刊的编委,已在几十个重大国际大会上任大会总主席和程序委员会主席。