Sunday
- Tutorials
- Welcome Reception
Please join us for a welcome drink and snacks. An opportunity to register and to pick up the conference documents will be provided.
10:00 - 11:00 | Registration, coffee |
11:00 - 13:00 | Tutorials |
13:00 - 14:00 | Lunch Break (catering) |
14:00 - 15:40 | Tutorials |
15:40 - 16:00 | Coffee Break |
16:00 - 17:30 | Tutorials |
17:30 - 19:00 | Welcome reception |
RNA sequencing (RNA-seq) is the method of choice for measuring the expression of RNAs in a cell population. In an RNA-seq experiment, sequencing the full length of larger RNA molecules requires fragmentation into smaller pieces to be compatible with limited read lengths of most deep-sequencing technologies. Unfortunately, the issue of non-uniform coverage across a genomic feature has been a concern in RNA-seq and is attributed to preferences for certain fragments in steps of library preparation and sequencing. However, the disparity between the observed non-uniformity of read coverage in RNA-seq data and the assumption of expected uniformity elicits a query on the read coverage profile one should expect across a transcript, if there are no biases in the sequencing protocol. We propose a simple model of unbiased fragmentation where we find that the expected coverage profile is not uniform and, in fact, depends on the ratio of fragment length to transcript length. To compare the non-uniformity proposed by our model with experimental data, we extended this simple model to incorporate empirical attributes matching that of the sequenced transcript in an RNA-seq experiment. In addition, we imposed an experimentally derived distribution on the frequency at which fragment lengths occur.
We used this model to compare our theoretical prediction with experimental data and with the uniform coverage model. If time permits, we will also discuss a potential application of our model.
High-throughput DNA/RNA sequencing is a routine experiment in molecular biology and life sciences in general. For instance, it is increasingly used in the hospital as a key procedure of personalized medicine. Compared to the second generation, third generation sequencing technologies produce longer reads with comparatively lower throughput and higher error rate. Those errors include substitutions, indels, and they hinder or at least complicate downstream analysis like mapping or de novo assembly. However, these long read data are often used in conjunction with short reads of the 2nd generation.
I will present a hybrid strategy for correcting the long reads using the short reads that we introduced last year. Unlike existing error correction tools, ours, called LoRDEC, avoids aligning short reads on long reads, which is computationally intensive. Instead, it takes advantage of a succinct graph to represent the short reads, and compares long reads to paths in the graph. Experiments show that LoRDEC outperforms existing methods in running time and memory while achieving a comparable correction performance. It can correct both Pacific Biosciences and MinION reads from Oxford Nanopore.
LoRDEC is available at http://atgc.lirmm.fr/lordec.
8:00 - 9:00 | Registration, coffee, poster set-up |
9:00 - 9:30 | Opening |
9:30 - 10:20 | Arndt von Haeseler |
10:20 - 11:00 | Coffee break |
11:00 - 12:40 | Talks |
12:40 - 14:00 | Lunch break |
14:00 - 14:40 | Eric Rivals |
14:40 - 15:20 | Talks |
15:20 - 16:00 | Coffee break |
16:00 - 17:00 | Talks |
17:00 - 17:30 | Poster flash presentations |
17:30 - 19:00 | Poster session |
20:00 - 21:00 | PC Dinner |
The abstraction of a genome as a linear sequence has created a vast sequence analysis literature with plethora of interesting subproblems defined and often algorithmically optimally solved; recent results in compressed indexing provide linear time sequence analysis functionality even in space close to what an input sequence occupies. One could say it is time to move on to more realistic abstractions of genomic content. This talk explores what happens to a selected classical sequence analysis tasks when labeled directed acyclic graphs (labeled DAGs) are used as inputs. Applications in partially phased diploid genomes, pan-genomes, and splicing graphs, are discussed. Some algorithms for the new problems are presented. The talk concludes with a list of open problems to summarize what needs to be achieved in order for the theory of labeled DAG analysis to reach completion similar to sequence analysis.
The life sciences are undergoing a transformation. Scientists are rapidly generating the most complex and heterogeneous datasets that science can currently imagine, with unprecedented volumes of biological data to manage. Data will only generate long-term value if it is Findable, Accessible, Interoperable and Re-usable (‘FAIR’). This requires a scalable infrastructure that connects local, national and European efforts and provides standards, tools and training for data management and analysis.
Established in January 2014, ELIXIR - the European life science Infrastructure for Biological Information - is a distributed organisation comprising national bioinformatics research infrastructures across Europe and the European Bioinformatics Institute (EMBL-EBI). This coordinated infrastructure supports data standards, exchange, interoperability, storage, security and training. From September 2015, the newly-awarded ELIXIR-EXCELERATE Horizon 2020 grant will fast-track ELIXIR’s early implementation phase by coordinating and enhancing existing resources into a world-leading data service for academia and industry and growing bioinformatics capacity and competence across Europe.
8:00 - 9:00 | Registration, coffee, view posters |
9:00 - 9:40 | Veli Mäkinen |
9:40 - 10:40 | Talks |
10:40 - 11:20 | Coffee break |
11:20 - 13:00 | Talks |
13:00 - 14:00 | Lunch break |
14:00 - 14:40 | Andrew Smith |
14:40 - 15:20 | FaBI |
15:20 - 16:00 | Coffee break |
16:00 - 21:00 | Social event (Bergbaumuseum Bochum) |
Accurate reconstruction of the evolutionary history of cancer in the patient and quantification of intra-tumour heterogeneity (ITH) are current challenges in cancer genomics. Genomic rearrangements are thereby of particular importance, but notoriously difficult to deal with computationally. The accuracy of tree inference from genomic rearrangements further depends on the quality of the phasing of copy-numbers: the assignment of major and minor copy-numbers to the two physical parental alleles. So far, phasing has been done using evolutionary criteria alone, a heuristic and computationally expensive procedure which impedes probe-level resolution tree reconstruction.
I will give an overview of the challenges and current state of research in reconstructing cancer trees from copy-number data. Results from our clinical studies demonstrate how ITH is associated with chemotherapy resistance in the clinic. I will further illustrate the importance of haplotype-specific copy-number assignment and show how the common genetic background between multiple samples from the same patient can be used to accurately phase copy-number data. This is a crucial step towards probe-level resolution tree inference on genomic rearrangement events in cancer and exact quantification of genetic heterogeneity for routine applications in translational cancer research.
8:00 - 8:20 | Registration, coffee, remove posters |
9:00 - 9:40 | Roland Schwarz |
9:40 - 10:40 | Talks |
10:40 - 11:20 | Coffee break |
11:20 - 13:00 | Talks |
13:00 - 14:00 | End / lunch |