“What’s the difference between a Fiat and a Toyota?”
Comparative genomics provides answers
When asked how many non-viral genomes have been sequenced up to now, Prof. Christophe Dessimoz – head of the Computational Evolutionary Biology and Genomics group at the CIG – does not need to think very long: “Definitively in the range of several tens of thousands”. Indeed, as a result of major technological advantages in the field, genomic sequencing has been advancing at a pace only very few would have dared to predict some decades ago. This technological revolution has given an immense impetus to comparative genomics. Prof. Dessimoz and his team are at the forefront of integrative and comparative genomics, situated at the interface between biology and computer science. They are seeking connections between genes and genomes and trying to understand evolutionary relationships between organisms across the Tree of Life.
“From elephant to butyric acid bacterium – it is all the same”
Albert Jan Kluyver, Dutch biochemist (1926)
The Computational Evolutionary Biology and Genomics group
– Prof. Dessimoz, you are a biocomputing scientist. Where you already, as of young age, immersed in computing?
CD: Yes, as a teenager I was very much – sometimes I had the feeling “too much” – absorbed by computers. As a matter of fact, weird as it may seem afterwards, this was part of my motivation not to study Computer Sciences. And so, I went to study Biology, to have a life outside the computing world…
– … almost as a kind of challenge?
CD: Yes indeed, that is probably one way to put it – in retrospect, I get the feeling that seeking non-obvious challenges seems to be a somewhat recurrent pattern in my life –, but do not misunderstand me: I also had great interest in the Life Sciences. Eventually, after finishing my studies, I was drawn more towards the theoretical aspects of Biology, rather than doing bench work, and as such, computing, once again, came within eye-range. As a matter of fact, I had already taken some computer courses during my studies, and I got accepted to study with Gaston Gonnet, Professor for bioinformatics at the Institute for Scientific Computation, at the ETH in Zürich.
– When was this?
CD: This was around 2003-2004. The Human Genome Project had just been completed, together with the elucidation of some other genomes and there was a great interest in doing DNA sequence comparisons among different organisms. These were the beginning years of comparative genomics, which was the topic of my PhD thesis. Afterwards, I received a fellowship to go to the European Bioinformatics Institute near Cambridge, which is part of EMBL. This can be considered as the “CERN for Bioinformatics”: a fantastic environment enabling to broaden my research perspectives and expand my network. You know, when you go to the cafeteria there, you’ll meet interesting people from all over the world…
– When did you come to the CIG?
CD: I joined the CIG in late 2015, as a Swiss National Science Foundation professor.
– What is the scientific background of the people in your team? Are they all trained as bioinformaticians or computer scientists?
CD: No, we have a multidisciplinary group. Some people in our team are biologists with lab experience and capable of doing bench work; another person, who is working on phylogenetics, has an engineering background, and then there are some bioinformaticians and one computer scientist, who has no connection with Life sciences. So, I believe we are a good mix of people to tackle the ongoing issues in the field of evolutionary genomics.
Integrative & Comparative Genomics
– Could you briefly outline the kind of work that is being done in your group?
CD: We are interested in relating genomic sequences across different species. A lot of the research work we do arises from the fact that, on the one hand, great numbers of genomes are being sequenced and huge amounts of data are being generated – for example, following technological advances, nucleic acid sequences are being elucidated at a pace unseen up until now – whereas on the other hand the true biological knowledge we possess is concentrated around no more than a handful of model organisms: E. coli and some other prokaryotes, Sacharomyces cerevisiae and S. pombe, C. elegans, Drosophila, zebrafish, some rodent models and Arabidopsis, to name the most important ones. And yet the diversity of Life is mind-boggling. Consequently, the obvious question arises: how do we extrapolate the information we have to all those other species that roam our planet?
– Yes, how?
CD: On the one hand, it is important to realize that, let’s say, a bacterium, a yeast cell and a human cell have a lot in common. Quite often, the fundamental cellular processes are comparable, if not identical: complementation experiments between distantly related species are often convincing and leave little space for doubt on that issue. On the other hand, if you want to grasp what makes a particular species a species, it is justified to focus on the differences, and trying to understand what is not the same will lead to an understanding of a species’ individuality. As such, comparisons at the genomic level enables the search for evolutionary innovations which arose in different organisms, as they are imprinted in their DNA. For example, we are presently involved in plant crop genomics. Keep in mind that all agronomical relevant crops are evolutionary distant from Arabidopsis: the last common ancestor between cereals, for instance, and Arabidopsis dates back to several 100 million years ago. Trying to understand the evolutionary path that has been walked since by scrutinizing and comparing different genomes, goes at the heart of what we do.
– You are trying to understand the complexity of Life at the genomic level?
CD: I would rather say that we try to find relationships amongst different genomes. And of course, those relationships are typically evolutionary. For all of Life, we can – at least from a theoretical perspective – trace different species back to a common ancestor and therefore there is this common denominator. As such, we can look at the evolution of function; at one time point two diverged genes in different organisms had a common ancestor. We try to tackle questions like: “Do these genes still perform the same function? If not, what has changed?” For instance, rodents have two genes coding for insulin, whereas we, humans, have only one. And if you also compare the genomes of other species, it becomes apparent that the ancestral mammal had only one copy coding for insulin. In other words, in this case, rodents constitute the abnormally diverged species and not Homo sapiens.
The OMA Project
CD: Allow me to stress the fact that, in trying to answer questions in the field of comparative and evolutionary genomics, we also aspire to develop methods and provide resources that enable other teams, be it in a collaborative effort or not, to tackle their questions.
– Could you please comment on that?
CD: The main resource that we have developed in our lab is the OMA database. The “Orthologous Matrix” or OMA inference algorithm is a method enabling the inference of orthologues from complete genomes. Homologous sequences – that is to say: sequences of common ancestry – are inferred by aligning genomes and retaining significant matches. In other words, it enables researchers to identify relationships among genes across species. It allows, for instance, to find out whether there are one, two, or multiple copies of a given gene within different species of a particular taxonomic range. This OMA project has been very interesting for us, not only in the sense that we have created a resource platform, but likewise we had to tackle plenty of methodological challenges – for example: how are 100 trillion alignments computed? – that ultimately lead to progress in the field.
– Could you give an example of what can be done within this OMA platform?
CD: Recently, we have been looking at the gene repertoire of bats…
– …after rodents, the most successful mammalian group – in terms of number of species – by the way.
CD: Right. As you know, bats orient themselves through echolocation, a sonar-like activity that enables the animal to position itself within a mentally produced map of its surroundings. As such, it could be imagined – given the fact that echolocation has become evolutionary important in this clade – that bats might have lost particular genes, for example those involved in olfaction, as smell has become of lesser importance along their evolutionary path. And this is indeed what we observe when comparing the genomes of different bat species. But there exists a small subgroup, the so called “fruit bats”, that have lost their ability to echolocate, and actually rely much more on smell. Now interestingly, when aligning and comparing bat genomes, it can be concluded that in fruit bats the olfactory gene repertoire has increased again, correlating nicely with their evolutionary history. On the other hand, the number of olfactory genes has increased in most carnivores, animals known to possess very good smelling capacities. So, by actually comparing genomes of many different species – be it bats or carnivores, or whichever taxonomic group – we can now pinpoint the position in time where particular genes, for example involved in olfaction, have expanded and contracted in the evolutionary past, millions of years ago.
The Quest for Orthologs Consortium
– You are one of the leaders of the Quest for Orthologs Consortium, which has standardized orthology benchmarking. Could you please comment on that?
CD: The Consortium is composed of different research groups with a common goal, namely comparing and relating genes across multiple species. In doing so, we are all facing similar challenges and problems, and as such we realized that we should all benefit from a closer interaction. And so, we first got together in 2009 and that is how the Consortium was born, through meetings and exchanges. Since then, we have developed quite some methods and provided resources for the community.
– Can you give an example?
CD: One practical challenge was related to how orthologs are defined. When browsing through the literature, you realize that there exists considerable variation in defining this concept. As a matter of fact, one of the first meetings of the Quest for the Orthologs Consortium dealt with defining unambiguously the concept “ortholog”.
– Please explain.
CD: One could think of genes as different parts of a car. For example, let us compare the handbrake between a Fiat and a Toyota. In other words: the ortholog of the handbrake in the Fiat is the handbrake in the Toyota. Now by defining “ortholog”, we postulate that both structures are derived from the handbrake of a common ancestral car model, let’s say a Ford T. This is opposed to the situation where the Ford T acquires at one time point both front and back lights, because the designer decided to copy the front lights and install them also at the back of that car. Such an event would be analogous to “gene duplication”. Now in this particular setting, if you would compare the front light of a Fiat with the back light of a Toyota, they would not be orthologs – instead, we call them paralogs – since they were already different lights in the Ford T, their last common ancestor. Now, why is this so interesting? Because if a gene duplication has occurred within a particular species and you have multiple copies of the same gene, they may be already specialized and perform different functions. As such, the distinction between “ortholog” and “paralog” – and finding answers to which is which – is very important within the field of comparative genomics. So now the question arises: how do we assess algorithmic methods concerning orthologs and paralogs?
– How?
CD: Since the true evolutionary history of a gene may never be known with certainty, we have to resort to indirect methods. We consider different aspects that can be expected to correlate with orthology and paralogy.
– Could you please comment on that?
CD: For instance, if orthologs divert through a speciation event, this means that building an evolutionary tree which is based on a collection of orthologs should reconstruct a species tree, right?
– Right.
CD: But it could be that our reconstruction of the evolutionary tree is jeopardized, as the result of horizontal gene transfer or some other genomic event that causes the true phylogeny of the gene to be blurred. However, if we repeat our reconstructions with many different gene families, it can be expected that on average the relationship between an evolutionary tree and a species tree holds. There are different ways to measure the quality of orthology and within the Quest for Orthologs Consortium, we standardized this.
– Would you say that this standardizing work is generally accepted?
CD: Yes, it’s not always easy to achieve some progress as a community, but one clear advantage is that whatever we achieve reflects a broad consensus. And a benchmarking step is really crucial for progress. As the management luminary Peter Drucker once said: “If you can´t measure it, you can´t improve it.”
Ronny Leemans.