Wednesday, February 26, 2025

Appendix 8.3 The Falsification of Evolution

To set our model up to do the same task that evolution must actually do; that is evolve a new gene by mutation and selection we change the alphabet of possible values from [H or T] to [ A, T, C, G ] four possibilities. I apply a 30% redundancy factor as for real proteins by dividing 4 by 1.44 to give the probability of a successful mutation equal to 1 in 2.78 instead of 1 in 4 which is a 30.5% increase in probability. At each event the population is given a point mutation as a single DNA code which is compared with an external random pick of one code. The population is then compared to the external code and all who do not match are culled leaving a reduced population for the next mutation event. Those left after each event have accumulated a series of beneficial mutations for the total number of events to that point. This is equivalent growing a gene of DNA codes by mutation and selection equal in length to the number of events. I then plot the resulting points in groups of three to indicate whole codons but the numbers being so large the scale of total mutation count is logarithmic.

Now to something quite obvious I am deliberately omitting any consideration of function! So every match from the very first codon equivalent to one amino acid is considered equally selectable and functional. The reason for this is simple because where function begins in terms of gene size is unknown and actually irrelevant for the testing of selection alone. While the change to function by any mutation is obviously important for selection in the wild by ignoring function here I am making a huge concession to evolution by natural selection which does face that challenge. It means if this model cannot evolve a gene of reasonable size in the assumed time of earths evolutionary history then the natural case being far less efficient also could not and this is the essence of falsification as a violation of the second law.

Tuesday, February 25, 2025

Appendix 8.2 The Falsification of Evolution

Now things start to get interesting but let me first examine the coin toss model;  

To guess 7 H-T tosses in a row we note there is 2^7 or 128 ways of arranging 7 coins which actually means this will occur on average every 128 tosses of seven coins for a total of 896 coin tosses and that is the work required to both create the order and pay the entropy cost required by the second law for that state of order in that system and by that process. It is vital you understand the connection with entropy at this point since entropy is a measure of disorder and disorder is the probability of a state of matter existing. That is from Boltzmann's [ s = k.logW ] Disorder = W/Wtot where W is the number of microstates in a chosen macrostate and Wtot is the total number of microstates in the system. In this case W = 1 (one way to get an exact sequence of H-T) out of Wtot = 128 (possible arrangements) and the disorder is the probability of that state = 1/128 = 0.0078 while the entropy is log 1 = 0 as it is the most ordered state you can get from that system. There is something else; we know work is the only form of energy that can create thermodynamic order. Work is also a product of vectors which have direction as well as magnitude implying a choice has to be made. So work is directed energy while heat is random energy and creates only disorder unless directed. 

Note however to get one arrangement of 7 in a row starting with an audience (population) and using selection to cull it only took 7 generations of selection events. Knowing the chance of selection 0.5 we can predict the population required since it is on average halved 7 times from an initial starting population = 2^7 or 128. The total number of coin tosses (mutations) = 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 254 so selection reduced the number of mutations required by (896 - 254)/896 x 100% = 71.6% revealing the power of selection over pure chance. The model tests survival in response to a changing environment. By starting with a population and introducing successive random mutations then selecting survivors matching an external random event and culling the rest it is a model of pure highly optimised selection. All we need now is to give this model the same task that evolution in the wild must have had, i.e. grow a new gene with no target to aim for just a fitness landscape to respond to.

Appendix 8.1The Falsification of Evolution by Natural Selection

Finding the mean total number of mutation events was vital as it is this that is directly proportional to the improbability or entropy of the state and hence the Entropy Cost of achieving that state under the second law. Let me illustrate what I mean by Entropy Cost with two dice: The probability of [6][6] is 1/36 but this does not mean we cannot throw [6][6] on the very first throw nor does it mean we are guaranteed to get [6][6] after 36 throws. What it really means is [6][6] will occur on average once every 36 throws of two dice if we just keep throwing the dice. The longer we keep doing it the closer will the ratio of total number of throws divided by occurrences of [6][6] approach 36 which is what the Law of Large Numbers predicts.

What that means is 36 throws of two dice is the minimum work required or entropy cost of the state of order of [6][6] in that system by the second law. Heavier dice simply increase the energy required but in no way does that affect the entropy cost measured as the number of throws of two dice = 72  demonstrating that entropy has nothing to do with energy. There is a paper yet to be published on that subject but let it suffice to know the second law imposes a minimum average number of random events to create a state of order by those events and it is equal to the improbability of the state. If any theory requires a state of order with less  random events to pay the entropy cost it is falsified by the 2nd Law.

The Dawkins coin toss model can be modified to be made generally applicable to any code base with any probabilities for selection and cull desired or even introduce code redundancies effectively moderating mutation probabilities for deselection etc. Noting that real evolution must in the end grow genes made up of an alphabet of bases A, T, C, G with certain probabilities of mutation it occurred to me the model could be configured to do exactly what evolution in the wild must do to grow a gene.

Wednesday, November 27, 2024

APPENDIX 8: The Falsification of Evolution by Natural Selection

After a great deal of work the penny finally dropped mid June 2019. All my computer simulation work was really doing was trying to reliably predict the mean or average work required (total number of mutations) to evolve a genome. The Second Law of Thermodynamics is a statement about the average behaviour of a system. As genome size increases the model should show both the power and the limit of what natural selection can do under the Second Law. The problem was to predict the mean of a distribution where the distribution has an unknown law and a large standard deviation. But now thanks to Richard Dawkins this problem has been solved! It is the model performed by Richard Dawkins in one of his Christmas lectures in the 1990's at the RI!

In the original demonstration an audience of about 120 children were asked to all stand and privately guess heads or tails. Another boy at the front was asked to toss a coin. All who got it wrong were asked to sit down. Honesty was as in all science basic to the success of the whole thing. This was repeated until there was only one child left standing at which point Dawkins pointed out he had just guessed seven H-T tosses correctly in a row demonstrating that improbable things are not so difficult to observe. It was of course a simulation of selection's power over pure chance but I don't recall Dawkins actually claiming that.

In this simple coin toss model there is NO TARGET! All simulations of evolution with a predetermined target are invalid because evolution has no target. Instead an external random event becomes the next test of survival for the last guess (mutation) which in a very simple way mimics response in a population to a change in an external fitness landscape. The model is purely a test of selection uncomplicated by other constraints. This model finally achieved my objective of having a model of mutation and selection in a population and I found it has an equation for the mean total number of mutations to evolve any code sequence of any given length. I believe this model has an equation because the true entropy cost is paid for up front by starting with a population which all gets culled so the outcome is predictable but only in entropy terms which is the cost of performing the total number of mutations.

Thursday, July 7, 2016

Apendix 7.5 Modeling Evolution

It turns out there is a very good reason why its difficult to include realistic selection in an equation of probability of evolution over time. Selection causes the outcome of one random mutation event to affect that of the following event breaching independence. The general form would normally be a Poisson distribution but that is only valid for independent events.

So lets keep things very simple and run the model for a more realistic gene pattern based on the A, T, C, G nucleotide bases of the DNA molecule. Same 10 genes, same 100% selection of the top scoring 5 genes reproduced faithfully to maintain the population of 10. So if we start with one mutation per generation per gene what happens..

To give some statistical validity I averaged the total number of mutation events to achieve the pattern [ATCGATCGATCG.. repeated] for various length genes over 10 cycles and I got this..

Gene      Number of
Length    Mutations
 (L)            (N)           Log(N)
10               818         2.9129
15             4353         3.6388
20           22956         4.3609
25         108139         5.0339
30         918605         5.9631
35       2453687         6.3898
40     16080795         7.2063

Basically log(N) graphed against (L) gives a nice straight line as expected since the governing distribution would be exponential (Poisson). My computer would not solve beyond length 40 without taking too long to do 10 cycles of each.. (days)..

A simple linear regression of the straight line projected out to just 150 bases long gives a first approximation.. however standard deviation is quite large..

The answer is for a gene of just 130 DNA bases evolved as using this model with perfect selection of every single mutation.. If a mutation occurred every millisecond it would take 3.6 billion years on average to get there.. While this result is mathematically correct its not the the most favorable model for a real evolutionary algorithm. What I have done here is pit a severe mutation rate against the very best selection rate.. and while the former won it means I need a more realistic mutation rates v more realistic selection rate. Not simple as Fred Hoyle's work shows..

I obviously need a more powerful computer and a bit more statistical work to tidy it up as a paper..

Wednesday, June 22, 2016

Appendix 7.4 Modelling Evolution

It seems for the statistical simulation model the efficiency of selection is not completely set by one value of Sn (or as in the stat model, top % of population selected). It is affected by the methodology used.. So in the original Dr J model all strings of '01' or '10' were counted resulting in 2000 - 4000 generations (single mutation) to achieve the 100 long "010101.." pattern. I have settled on simply counting every correct digit (base) be it 0 or 1 or A or T or C or G etc in its correct position. Running this model 50 times the sample mean to 'evolve' the 100 pattern was 443 generations which with a 10 gene population = 4431 mutation events.. min 2021, max 12870. Now that's with one mutation, on a simple 2 code choice model with perfect selection/copy of the top 50% after each single mutation level. A fair way from reality.

With just 2 mutations however these simple models reveal a real problem.. so far none have completed the pattern. As it turns out a single mutation in a 2 code system has only a 25% chance of being detrimental similar to the Fred Hoyle analysis of the 'naive' single beneficial mutation. However with a second mutation that all changes as the second becomes predominantly detrimental.

So do two mutations acting on 100 base length of genome = 2% mutation rate? Imply a massive overstatement of the rate of mutation. Lets pose the question.. How many DNA changes generally occur in concert before a selectable trait is produced? Putting it another way.. Is every single DNA mutation normally selectable.. clearly not. By requiring just 2 mutations to act together before selection criteria is applied is actually a huge concession to what occurs in the real world. Recall I am only modelling the algorithm of evolution and artificially increasing the rate of mutation just facilitates a quicker result particularly since every beneficial change is selected.

Including realistic selection strengths in models and equations of evolution has proved very difficult. So to keep things simple I am going to assume 100% selection and reproduction of every beneficial change. Which may mean a more realistic number of mutations (ie 2) or larger population size. The objective is a rigorous, simple, verifiable test of the evolutionary algorithm to which end it is imperative I give every possible concession to evolution theory.

So what's the result..

Sunday, April 24, 2016

Appendix 7.3 Modelling Evolution

The coin toss simulation is not a model of evolution but a model of the algorithm of evolution which allows exploration of the strength of natural selection to overcome the normal destructive effects of random mutation. Since natural selection does change the probability of a required outcome it would be nice to find an equation for that probability. In Appendix 5.1 I discovered a number which was effectively a boundary condition the 2nd Law imposes on the random assembly of any coded string from a finite alphabet. An equation for probability of evolving a given 'gene' after any number of generations including the effect of natural selection would allow me to relate the improbability of the state to the limit imposed by the Second Law.

I have an equation which on preliminary testing shows agreement with some statistical modelling, Fred Hoyle's results [The Mathematics of Evolution] and recently published papers like this..

[http://dx.doi.org/10.1371/journal.pone.0000096].. noting..
"Although a great deal is known about the landscape structure near the fitness peaks of native proteins [5][7][9][15], little is known about structures near the bottom, which contain information regarding primordial protein evolution." and..
"Although it was shown to be possible for a single arbitrarily chosen polypeptide to evolve infectivity, the evolution stagnated after the 7th generation, which was probably due to the small mutant library size at each generation."

While such modelling does show some increase in fitness as complexity (sequence length) increases it effectively stagnates at some limiting value dependent upon the "mutant library size".. This is precisely what my Second Law boundary condition predicts..

The equation is proving difficult to verify and I initially had problems with software handling very large/small numbers (now solved).. I started defining selection success 'Sn' as the probability that positive mutations on average will succeed.. ie not die or get eaten before they can reproduce.  Sn varies like this..

Sn = 0  (no selection)   to   Sn = 1  (100% selection)

Some program confirmation of stagnation occurred for all values of Sn for large enough gene lengths. However the extreme sensitivity as Sn dropped even minutely below 1 for me urges caution so I'll hold that result until fully verified.