Role of long non-coding RNA variation in A. thaliana
Role of long non-coding RNA variation in A. thaliana
Disciplines
Biology (100%)
Keywords
-
Natural Variation,
Arabidopsis thaliana,
Gene Regulation,
Population Genetics,
Epigenetics,
Long Non-Coding Rnas
What makes us different from each other? Biology, and in particular genome research, try to answer this question. Our genome and genomes of other organisms can be transcribed into RNA and then translated into proteins the building blocks of the organism. Pieces of the genome coding for different proteins are called protein-coding genes and have long been considered the main information carried by the genome. However, in the last decades we have realized that non-protein-coding parts of the genome are also very important. They give rise to many RNAs, non-coding RNAs, that never turn into proteins, but some of which have been shown to regulate many important processes in the cell. One class of such RNA molecules is called long non-coding RNAs (lncRNAs) and they have a very interesting history: they are everywhere in the genome, but no one noticed this before because the appropriate technology was not available. Now we know there are >60,000 lncRNA genes in the human genome, and hundreds to thousands are found in many other organisms too. These genes seem to regulate other genes through many different mechanisms. For example, they can cause chemical modifications on other genes, called epigenetic modifications, inactivating the gene without changing its genetic sequence. Recently it was found that lncRNAs are expressed in a person-specific manner, i.e. their natural expression variation is very high, while protein-coding gene expression is more constant. Could lncRNAs contribute to our differences if they are different between different people? This is the background question of this project. Humans are difficult to study experimentally, but fundamental questions can be addressed using model organisms, since all life on earth have, fundamentally, a lot in common. The best model system to study natural variation is plants. Research in plants has helped uncover many fundamental aspects of how genomes work. The most commonly used plant is Arabidopsis thaliana a small flower that grows everywhere in Europe. What we want to do in this project is to use a collection of these plants from almost 1000 different geographic locations to analyze lncRNA variation. Members of this collection have slightly different genomes, just like humans do. We will try to understand what differences in the genome, or in epigenetic modifications of it, make lncRNA expression so individually variable and how this variability affects expression of other genes, epigenetic marks on these genes, and different traits of the plants (phenotypes), such as resistance to pathogens. I will perform genetic manipulation experiments to prove functional roles for several of the most interesting lncRNAs that we will find. Plants are immobile, and adaptation to the environment is especially important for them. I will use the data and the analyses produced in this project to find lncRNAs that might be involved in the adaptation of Arabidopsis thaliana.
What is it in our DNA that makes us who we are and different from each other? Biology, and in particular genome research, try to answer these questions. The genome (our DNA) can be transcribed into RNA and then translated into proteins - the building blocks of the organism. But in human only 2% of DNA codes for proteins. What are the other 98% doing? Is this useless "junk", or is this an instruction on how, where and when to properly use of the precious 2%? The short answer is, as it often comes: it's both. About 30 years ago scientists discovered that apart from the classical protein-coding genes, there are also genes that do not code for any proteins, they only make RNA. These were called long non-coding RNAs (lncRNAs), and they are what this project is about. 10 years ago it became clear that lncRNAs are not just odd occurrences, instead - thousands of them fill the genomes of nearly every organism on earth. The young field of lncRNA research tries to understand these genes and what they are doing. This project focused on a weed called Arabidopsis thaliana. Nobody knows this small plant except for plant scientists for whom it is like a lab mouse - a model organism. This plant grows in many places: in America, Spain, Sweden and even in Africa. Our lab and collaborators collected these plants all over the world to study natural variation: What are the differences in the genomes, are genes expressed differently? My project asked: are lncRNAs different between the sister plants: how much and why? There were only 4000 lncRNAs known in A.thaliana when I started but analyzing 500 A.thaliana lines from all over the world allowed me to discover 10000 more lncRNA genes. We realized that the genome is swarming with lncRNAs, but they are mostly silent at any given line or tissue. Moreover, every line expresses quite a unique set of these non-coding genes, as if it is a sort of barcode - very individual-specific. We were very eager to understand what makes lncRNAs so variable between the plants coming from different regions, while the ordinary (protein-coding) genes were much more stable. It turns out that there is a turmoil going on and that different lncRNAs can be silenced by many different mechanisms. Moreover, in every line these mechanisms can affect different lncRNAs. In short, a combination of genetic mutations of single genetic letters, as well as movement, deletion, or insertion of whole pieces of the genome, but also massive variation in epigenetic modifications - those chemical modifications to DNA and the proteins that DNA is wrapped around, all underlie the extreme variation of lncRNAs.