Thursday, July 7, 2016

Apendix 7.5 Modeling Evolution

It turns out there is a very good reason why its difficult to include realistic selection in an equation of probability of evolution over time. Selection causes the outcome of one random mutation event to affect that of the following event breaching independence. The general form would normally be a Poisson distribution but that is only valid for independent events.

So lets keep things very simple and run the model for a more realistic gene pattern based on the A, T, C, G nucleotide bases of the DNA molecule. Same 10 genes, same 100% selection of the top scoring 5 genes reproduced faithfully to maintain the population of 10. So if we start with one mutation per generation per gene what happens..

To give some statistical validity I averaged the total number of mutation events to achieve the pattern [ATCGATCGATCG.. repeated] for various length genes over 10 cycles and I got this..

Gene      Number of
Length    Mutations
 (L)            (N)           Log(N)
10               818         2.9129
15             4353         3.6388
20           22956         4.3609
25         108139         5.0339
30         918605         5.9631
35       2453687         6.3898
40     16080795         7.2063

Basically log(N) graphed against (L) gives a nice straight line as expected since the governing distribution would be exponential (Poisson). My computer would not solve beyond length 40 without taking too long to do 10 cycles of each.. (days)..

A simple linear regression of the straight line projected out to just 150 bases long gives a first approximation.. however standard deviation is quite large..

The answer is for a gene of just 130 DNA bases evolved as using this model with perfect selection of every single mutation.. If a mutation occurred every millisecond it would take 3.6 billion years on average to get there.. While this result is mathematically correct its not the the most favorable model for a real evolutionary algorithm. What I have done here is pit a severe mutation rate against the very best selection rate.. and while the former won it means I need a more realistic mutation rates v more realistic selection rate. Not simple as Fred Hoyle's work shows..

I obviously need a more powerful computer and a bit more statistical work to tidy it up as a paper..