Fundamentals of Proteome Bioinformatics Revisited

11:00	Sequence databases
11:45	Spectra identification
12:30	Lunch break
13:30	Decoy approach
14:15	Protein inference
15:00	Coffee break
15:15	Protein quantification
16:00	OpenMSKNIME
16:45	Time for questions own hands-on analyses
17:30	End of the tutorial

Nowadays, bioinformatics provides many software tools and workflows for high-throughput proteomics enabling fast and convenient analysis of mass spectrometric measurements. However, the ease of use of these software solutions blurs the complexity of this analysis and the interdependence of the consecutive working steps applied. This tutorial will ‘open the black box’: the most important parts of a proteomics workflows (sequence databases, spectra identification, decoy approach for false-discovery-estimation, protein inference, protein quantification) and proteomics software will be reviewed regarding their relationships, pitfalls and consequences of crucial decisions. The course will be offered by some of the leading groups in the area in Germany who can provide unique insights into some of the tools they have been developing. The tutorial is split into smaller components and integrates theoretical foundations with applied hands-on sessions in an interactive, blended learning experience. Supervised by experts in the field, the participants will be given theoretical basics and will then be able to explore real-world datasets on their own. The tutorial covers all basic topics of the analysis of large-scale proteomics data, starting from fundamentals of the data generation to the fully automated processing of large data sets through workflow systems.

Covered Topics

Please bring your own laptop for the hands-on session as we can only provide a limited number of PCs.

Sequence databases

Protein sequence information stored in public databases such as UniprotKB is an essential component of almost all workflows in proteomics. However, distinct approaches are used for the generation of such databases and they provide different views on the considered proteomes. As a consequence, there are some database-related issues for bioinformatics workflows and this tutorial will give a problem-oriented review and comparison of protein sequence databases.

Spectra identification and decoy approach

We will give an overview about important protein identification tools for MS/MS spectra. Using examples, search parameters and their impact on search process and results will be shown. Some important search engine-specific scoring systems and their influence factors will be addressed in detail. Moreover, the popular target/decoy approach controlling false positives and allowing the comparison of results from different search engines will be reviewed. The prerequisites and different varieties of this concept will be discussed.

Protein inference

The bottom up approach makes it necessary to infer proteins from the peptides identified by search engines. Due to the existence of shared (degenerated) peptides this is not a trivial task. Since shared (degenerated) peptides occur in multiple proteins in the searched database it is not obvious from which protein an identified peptide derives. This protein ambiguity needs to be handled to generate meaningful final protein lists. Therefore, several different protein inference algorithms are available. We will review the most frequently used tools for protein inference, their requirements and also the basics of the fundamental algorithms.

Protein quantification

For protein quantification several very different approaches are available. Hence, it is crucial to find the most appropriate quantification method and software tool for a given data set and research question. We are going to review the advantages, the applicability and the use cases of different quantification methods such as label-free, SILAC and iTRAQ. Moreover, we will present software tools available for protein quantification.

Workflow systems for data processing

In order to analyse proteomics data, flexible workflows combining appropriate tools are required. Therefore, we will present the workflow engine KNIME in combination with the open source MS software library OpenMS, which can be used for data management and analyses. OpenMS comes with a vast variety of pre-built and ready-to-use tools for proteomics and metabolomics data analysis and powerful 2D and 3D visualization. In order to facilitate workflow construction, OpenMS was integrated into KNIME, the Konstanz Information Miner, an open-source integration platform providing a powerful and flexible workflow system combined with advanced data analytics, visualization and report capabilities.

Tutorial 1, Sunday (Full-Day)