POPULATION GENETICS ASSIGNMENT:
Students will first simulate and produce a microsatellite dataset in EASYPOP, and then analyse this dataset under different population demographic and evolutionary scenarios of their choice based on topics covered in the preceding lectures. The simulated datasets will lead students to different outcomes, and they will have to formulate hypotheses and ideas around these outcomes. The students will have two dedicated computer lab session to work through the software associated with the analyses based on the simulated dataset. The report will be approximately 1,000 words in length (excluding tables, legends and references) and is worth 40% of the module’s overall marks.
Use the software EASYPOP, running it from a specific physical folder in one of your drives. If you did not attend the practical, you can use basic settings indicated, as an example, in the relevant “EASYPOP_settings” Word document (in the Software for ‘Practicals and Assignments’ folder) to simulate anmicrosatellite dataset composed of three separate populations.The setting to be changed is ‘proportion of female/male migration’ and is highlighted in the document. If you feel confident changing other settings, you must be confident about the biological implications of these changes.Please ensure your chosen settings are logical, meaningful, and in line with what we discussed in practicals and lectures.
In this same folder, there are another three EASYPOP files, each with a different ‘migration rate’ scenario for the three populations. You will use ONE of these files in addition to the one that you have created yourself in the practical/at home to compare and contrast the patterns of genetic diversity within each population and differentiation between (Fst) different migration scenarios.
Then, use the EASYPOP output files as inputs for the software FSTAT (also available in the Software folder), to estimate: 1) number of alleles; 2) gene diversity; 3) Allelic Richness; 4) overall Fst; and 5) Fst per pairs of samples. When you first open FSTAT, you will be asked to put in random seed numbers, just put any numbers in these two boxes to continue. Then, select ‘File’ and then ‘Open’ in the top left corner of the programme to open the EASYPOP ‘.dat’ files that you have created. Refer to the ‘Fstat – boxes to click’ screenshot file in the Practicals folder to see what option to click and run. Do this for both the population scenarios previously generated. Several output files will be generated, of which THREE are relevant to the assignment. These are the ‘.out’, ‘.fst’ and ‘.pvl’ files.
I have uploaded a Word document entitled ‘Example_FSTATout.docx’ that has that has highlighted in yellow the key tables of interest in the ‘.out’ file. Although we discussed ‘Fis’ in the practical, this can be ignored for the sake of the assignment. Here, you need to pay attention to the different measures of genetic diversity and what they mean in the context of the migration rate you have used.
The ‘.fst’ file contains the pairwise Fst values (in a matrix) between each population and the ‘.pvl’ file contains the associated p-values and whether or not these Fst values are significantly different.
The main purpose of the assignment is to compare/contrast the results of genetic diversity (from the .out file) and differentiation(from the .fst file and accompanying .pvl file) and discuss the results obtained, based on the different settings chosen (which must be illustrated), in relation to the main neutral evolutionary forces.So your assignment has TWO main components, diversity (no. of alleles, gene diversity and allelic richness) and differentiation (Fst). A major reason for people failing to pass the assignment in the past was only doing one of the two components, and doing it poorly at that.
As discussed in the practical sessions, students are asked to come up with a hypothetical case of an animal/plant system where your simulated data could be applied to. Think of an organism in a certain environment/system that is of interest to you and what sort of situations could give rise to the parameters (migration/no migration) that you have used to create your simulated datasets (badger populations in the UK, wild dog populations in Africa etc.). The important aspect of this assignment is a comparison between the results obtained from FSTAT from the two different EASYPOP files (each with a different migration rate!). So what this would mean in principle, is a comparison of populations of a species within an area where there are factor(s) leading to no migration, compared with an area where migration is possible. Therefore, think carefully of which factor(s) could lead to no, or some, migration. There have been plenty of examples in the lectures of factors which could be barriers to migration/gene flow. This section will need to be backed up by examples from the scientific literature.Don’t overcomplicate things!
Be careful here and pay attention to the settings that you have used to create your EASYPOP output file(s). For example, think about the migration rate used, the number of generations, the number of individuals within your populations and do these make sense for the organism which you have chosen. Choosing unrealistic organisms/systems in relation to your created dataset will cost you significant marks as it will demonstrate a lack of understanding about the analyses that you have performed. Remember that this is ultimately about understanding the influence of a key process in evolution on neutral genetic variation and differentiation. Specifically, we mentioned the important components in both the practicals and lectures: migration/gene flow and population size. Why are there significant/non-significant differences in Fst between populations, and differences in values of genetic diversity between populations under different scenarios of migration?
It is important that you refer to the lectures given on this topic. If students want to do well in this assignment, it is also expected that they use the scientific literature to support their interpretation of their results (see the mark scheme).
Structure of the assignment and how marks will be allocated:
- Introduction (20%): Rationale for choosing simulated population scenarios within a hypothetical case study of a system of your choice and question(s) to be addressed.
- Methods (40%): Simulation of population scenarios with EASYPOP (20%) and computation of population genetics statistics with FSTAT (20%) with a demonstrated understanding of why the analyses are being undertaken.
- Methods/Results (20%): Appropriate reporting of key settings/parametersused and results in legible format, i.e. using appropriate Tables/Figures (with proper legends) and text.Note how ugly and unformatted FSTAT output results look! Create your own ways of reporting these results using carefully formatted tables and think about using graphs that will allow comparisons to made between scenarios.
- Discussion (20%): Interpretation of outputs in a neutral genetic variation context and formulation of novel and challenging ideas/hypotheses/tests based on the different scenarios.
Assignment should be 1,000 words in length (excluding tables, references and legends). No marks deducted for word count being outside these limits but being markedly below this would indicate a lack of understanding around the assignment.