|
SystemTrade
2013-07-06 00:49:33
1
|
|
I know, I am answering to my own thread, but let me share some of my experiences with artificial intelligence with you.
Even today I cannot finally say whether the GA or the PBIL algorithm generally leads to better results with my trading systems. My decision to move forward with PBIL and not GA was basically driven by the fact that QuantShare introduced multi-threading to the PBIL algorithm end of 2012. My thoughts were that even if GA would be quicker than PBIL that advantage would be over-compensated by multi-threading. This situation has further improved with the introduction of QS version 2.9.9 which runs under 64-bit now.
How is PBIL working?
PBIL algorithm as implemented in QS has the following parameters:
#gen: Number of generations (1 ... 100.000, default 100)
#ps: Population Size (1 ... 100, default 10)
#lr: Learning Rate (0 ... 1, default 0.01)
#bs: Number of best solutions to use in learning (0 ... 100, default 3)
Let's assume the following optimisation scenario
Optimize("var_1", 1, 5, 1);
Optimize("var_2", 3, 6, 1);
Optimize("var_3", 2, 3, 1);
Rather than storing the individual fitness values for each chromosome, the PBIL algorithm will populate a probability matrix for the entire population. I do not know exactly how QS handles this internally, but that does not matter as I want to focus on the principle here.
The initial probability matrix could look like this (e.g. 0.25 in var_2 as var_2 can have 4 values from 3 to 6, so the equal weighted probability of each possibility is 1/4):
Gene__var_1___var_2____var_3
1_____0.2_____-________ -
2_____0.2_____-________0.5
3_____0.2_____0.25_____0.5
4_____0.2_____0.25_____-
5_____0.2_____0.25_____-
6_____-_______0.25_____-
Total__1_______1________1
The algorithm will now randomly select #ps combinations from var_1, var_2 and var_3 and calculate the fitness values for them.
The #bs combinations with highest fitness values will be selected and the probability matrix will be updated with those - all other combinations will be ignored. For simplification lets assume that #bs is only 1, #lr is 0.01 and the chromosome with highest fitness (concrete fitness level does not matter) is {3, 4, 2}.
How will probability get updated?
probability of gene 3 in var_1 becomes: 0.2 + 0.01 * (1 - 0.2) = 0.208
probability of gene 4 in var_2 becomes: 0.25 + 0.01 * (1 - 0.25) = 0.2575
probability of gene 2 in var_3 becomes: 0.5 + 0.01 * (1 - 0.5) = 0.505
or in general terms: prob_new = prob_old + #lr * (1 - prob_old)
The probability of the other genes is reduced accordingly, so the probability matrix after generation 1 will look like this:
Gene____var_1____var_2____var_3
1_______0.198____-________-
2_______0.198____-________0.505
3_______0.208____0.2475___0.495
4_______0.198____0.2575___-
5_______0.198____0.2475___-
6_______-________0.2475___-
Total____1________1________1
Note that the probability of each variable is calculated independently from the other variables in PBIL which is different from the GA algorithm.
This exercise will be repeated with updated probabilities either #gen times or until convergence happens. Convergence means that the probability has moved towards 0 or 1 so that no new combinations can be created.
How do I parametrise #gen, #ps, #lr, #bs and how many turns should be run?
#gen / #turns
If you want to simulate for instance over 1.000 generations, you can either run 1 turn with 1.000 generations (assuming no earlier convergence) or 10 turns with 100 generations. Keep in mind that the probability matrix will be reset at the beginning of each turn and there will be no learning from previous turns.
Regularly I use to select 1.000 generations, because I want to see the full effect from learning.
I don't use fixed number of turns. Normally I run another turn only if the previous one has resulted in chromosomes with higher fitness.
#ps
Increasing this parameter will increase calculation time for each generation and also consume more memory. A system over 15 years of S&P 500 members consumed around 12 GB of memory when I used a population size of 100, so 64-bit is quite essential here. But an increased #ps will make it more likely that good solutions can be found in a generation. Normally I keep the default value of 10.
#lr
The higher the learning rate the higher the impact of the best solutions of a generation on the probability matrix.
If you are exploring a system and your optimisable variables have a big span and the total number of variations is huge, I would suggest to have a lower learning rate, e.g. I use 0.005 in this case. Whereas when you fine-tune a system, #lr can be increased, here I would move the default rate of 0.01 up to 0.02.
#bs
Unless you are fine-tuning a system, I would tend to say that 90% of all possible combinations do not lead to profitable systems. So, the best solutions of a generation might not be good solutions for you and you might want to avoid learning from that solutions. To overcome this problem you might have something like this in your fitness formula:
if (Fitness < YourMinimumFitnessLevel) Fitness = Double.NaN;
From the other solutions that are above the minimum fitness level, I normally take only the best one, so #bs = 1.
Your comments as well as further discussions are highly appreciated.
|
|