Genetic Algorithm (GA) vs Population-Based Incremental Learning (PBIL)

Genetic Algorithm (GA) vs Population-Based Incremental Learning (PBIL)

Back to threads - Tags: --

SystemTrade
2012-03-16 12:27:35

@ all:
My experiments with GA and PBIL methods for trading system optimization so far show that PBIL is easier to implement because of less parameters and I also think that there is much quicker convergence.

Which one do you actually use for which markets? I would appreciate your feedback on your experiences.

QuantShare
2012-03-17 07:26:33

It is a tough question. I think both are equivalent.

I usually use them both when optimizing my strategies and results are often the same. But sometimes, one of these algorithms (GA or PBIL) outperform the other. So, I think it depends on the strategy and on the number of variables. This is my personal opinion.

I will maybe try an experiment where I test several systems (with different number of variables), run them several times and see which algorithm performs best on average.

SystemTrade
2013-07-06 00:49:33

I know, I am answering to my own thread, but let me share some of my experiences with artificial intelligence with you.

Even today I cannot finally say whether the GA or the PBIL algorithm generally leads to better results with my trading systems. My decision to move forward with PBIL and not GA was basically driven by the fact that QuantShare introduced multi-threading to the PBIL algorithm end of 2012. My thoughts were that even if GA would be quicker than PBIL that advantage would be over-compensated by multi-threading. This situation has further improved with the introduction of QS version 2.9.9 which runs under 64-bit now.

How is PBIL working?

PBIL algorithm as implemented in QS has the following parameters:
#gen: Number of generations (1 ... 100.000, default 100)
#ps: Population Size (1 ... 100, default 10)
#lr: Learning Rate (0 ... 1, default 0.01)
#bs: Number of best solutions to use in learning (0 ... 100, default 3)

Let's assume the following optimisation scenario

Optimize("var_1", 1, 5, 1);
Optimize("var_2", 3, 6, 1);
Optimize("var_3", 2, 3, 1);

Rather than storing the individual fitness values for each chromosome, the PBIL algorithm will populate a probability matrix for the entire population. I do not know exactly how QS handles this internally, but that does not matter as I want to focus on the principle here.

The initial probability matrix could look like this (e.g. 0.25 in var_2 as var_2 can have 4 values from 3 to 6, so the equal weighted probability of each possibility is 1/4):

Gene__var_1___var_2____var_3
1_____0.2_____-________ -
2_____0.2_____-________0.5
3_____0.2_____0.25_____0.5
4_____0.2_____0.25_____-
5_____0.2_____0.25_____-
6_____-_______0.25_____-
Total__1_______1________1

The algorithm will now randomly select #ps combinations from var_1, var_2 and var_3 and calculate the fitness values for them.

The #bs combinations with highest fitness values will be selected and the probability matrix will be updated with those - all other combinations will be ignored. For simplification lets assume that #bs is only 1, #lr is 0.01 and the chromosome with highest fitness (concrete fitness level does not matter) is {3, 4, 2}.

How will probability get updated?

probability of gene 3 in var_1 becomes: 0.2 + 0.01 * (1 - 0.2) = 0.208
probability of gene 4 in var_2 becomes: 0.25 + 0.01 * (1 - 0.25) = 0.2575
probability of gene 2 in var_3 becomes: 0.5 + 0.01 * (1 - 0.5) = 0.505
or in general terms: prob_new = prob_old + #lr * (1 - prob_old)

The probability of the other genes is reduced accordingly, so the probability matrix after generation 1 will look like this:

Gene____var_1____var_2____var_3
1_______0.198____-________-
2_______0.198____-________0.505
3_______0.208____0.2475___0.495
4_______0.198____0.2575___-
5_______0.198____0.2475___-
6_______-________0.2475___-
Total____1________1________1

Note that the probability of each variable is calculated independently from the other variables in PBIL which is different from the GA algorithm.

This exercise will be repeated with updated probabilities either #gen times or until convergence happens. Convergence means that the probability has moved towards 0 or 1 so that no new combinations can be created.

How do I parametrise #gen, #ps, #lr, #bs and how many turns should be run?

#gen / #turns
If you want to simulate for instance over 1.000 generations, you can either run 1 turn with 1.000 generations (assuming no earlier convergence) or 10 turns with 100 generations. Keep in mind that the probability matrix will be reset at the beginning of each turn and there will be no learning from previous turns.
Regularly I use to select 1.000 generations, because I want to see the full effect from learning.
I don't use fixed number of turns. Normally I run another turn only if the previous one has resulted in chromosomes with higher fitness.

#ps
Increasing this parameter will increase calculation time for each generation and also consume more memory. A system over 15 years of S&P 500 members consumed around 12 GB of memory when I used a population size of 100, so 64-bit is quite essential here. But an increased #ps will make it more likely that good solutions can be found in a generation. Normally I keep the default value of 10.

#lr
The higher the learning rate the higher the impact of the best solutions of a generation on the probability matrix.
If you are exploring a system and your optimisable variables have a big span and the total number of variations is huge, I would suggest to have a lower learning rate, e.g. I use 0.005 in this case. Whereas when you fine-tune a system, #lr can be increased, here I would move the default rate of 0.01 up to 0.02.

#bs
Unless you are fine-tuning a system, I would tend to say that 90% of all possible combinations do not lead to profitable systems. So, the best solutions of a generation might not be good solutions for you and you might want to avoid learning from that solutions. To overcome this problem you might have something like this in your fitness formula:
if (Fitness < YourMinimumFitnessLevel) Fitness = Double.NaN;
From the other solutions that are above the minimum fitness level, I normally take only the best one, so #bs = 1.

Your comments as well as further discussions are highly appreciated.

QuantShare
2013-07-06 07:36:50

Excellent as always.

Did you test different fitness functions? Annual Return vs Sharpe vs Custom Functions? Any interesting results?

clonex
2013-07-06 22:08:36

Excellent!!!!

Thank You very much for Your insights.

Clonex

SystemTrade
2013-07-09 23:08:47

Thanks - Yes, I tested several different fitness functions. Generally, defining the fitness function seems to be one of the most critical subjects in developing a trading system. From my experience I can say, that single-parameter fitness functions (Annual Return or Sharpe used stand alone) did not lead to appropriate results during optimizations. For instance, I had an EOD-system with a CAR of 54% over backtested 10 years. Sounds nice? No, because when I looked into it, the profit was generated in 2 or 3 extraordinary years only, whereas the other years showed negative results. This is not what we want. Same with other indicators. If we just try to minimize MaxDD, we will probably get a system with 0% DD from the optimization run, but it will probably also have zero trades.

You can read more here: http://www.quantshare.com/title-659-custom-multi-dimensional-fitness-formula-cmdff-for-optimizer

QuantShare
2013-07-10 06:46:25

That is why we have added the following variables to the Fitness script:
DailyReturn
WeeklyReturn
MonthlyReturn
YearlyReturn

Here is a Fitness formula example that sets the fitness value to NaN if the strategy doesn't show positive result in every year:
bool posYears = true
for(int i=0;i < YearlyReturn.Count;i++)
{
if(YearlyReturn[i] < 0)
{
posYears = false;
break;
}
}

if(posYears)
{
Fitness = AnnualReturn;
}
else
{
Fitness = double.NaN;
}

SystemTrade
2013-08-22 14:59:16

There is another important setting in the optimizer that can indirectly improve the performance of an optimizer run, especially if the run is over several turns:

"Number of Trading System ... to keep in Report", default is 100

This setting not only limits the number of chromosomes shown in the report, but also prevents the optimizer from re-calculating chromosomes that are randomly chosen multiple times.
Example:
Lets assume 200 fitness calculations have been performed. If setting is 100, the report will keep the chromosomes with top 100 fitness values and skip the others.
With the next chromosome selected three possibilities exist:
a) fitness of the chromosome has not been calculated before
b) fitness of the chromosome has been calculated before and is listed in the (top 100) optimizer report
c) fitness of the chromosome has been calculated before and is not listed in the optimizer report (top 101-200)
In cases a) and c) the fitness value of the chromosome will be calculated whereas in case b) the fitness value can be taken from the report, which saves calculation time.

The effect of course depends on the structure of the optizer variables, but in complex optimizer runs I use to increase the setting to 100.000.

Get more messages

No more messages

Back


Trading Items Tweet Sentiment Index for the Forex and Stock Market Time Under Water - Drawdown Megan Ratio - Maximum Exponential Growth Annualized Trading Rules based on the Relative Strength Index - RSI Indicato... Combination Element from Lexicographical Index	How-to Lessons How to optimize a neural network using a genetic algorithm How to create a screen How to display notes/commentaries on a chart How to create a hatched area in a chart How to create a correlation matrix of several securities
Related Forum Threads My trading algorithm won't stop buying shares Viewing qsa files as Learning Tool Genetic Algos Optimization Simulator algorithm change/option Learning to program QuantShare - how to begin	Blog Posts Backtesting Process Optimization of a trading system Synchronize Buy/Sell List of Rules in the Trading System Optimize... How to Switch On/Off Trading Rules in Your Stock Trading System How to Optimize a Trading System with Thousands of Billions of Co...