Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 [ 153 ] 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

which were very good. After 80 generations, the package lowered the score to 21-considerably better than he was able to do by hand.

This example gives a good feeling for optimization problems where genetic algorithms are applicable. They differ from most data mining problems because they are more rule-oriented than data-oriented. The key to solving these problems is to incorporate the constraints into a single fitness function to be optimized (either by finding a maximum or a minimum). The resulting fitness function might be highly nonlinear, making it difficult to optimize using other techniques. As we will see, the same techniques adapt to situations featuring larger amounts of data.

Genetic algorithms are a good tool when there are more rules than data in the problem (although they are useful in other areas as well). These types of scheduling problems often involve competition for limited resources subject to complex relationships that describe resources and their users.

Schemata: Why Genetic Algorithms Work

At first sight, there is nothing sacrosanct in the selection, crossover, and mutation operators introduced earlier in this chapter. Why, for instance, does crossover choose only one intermediate point instead of two or more? Why do low mutation rates produce better results? The fact that nature behaves this way is not sufficient justification if multiple crossover points would produce better results more quickly or if a high mutation rate would work better.

For solving problems that yield actionable results, the fact that genetic algorithms have worked well in practice may be sufficient justification for continuing to use them as they are. However, it is still comforting to know that the technique has a theoretical foundation. Prof. Holland developed his theory of schemata processing in the early 1970s to explain why selection, crossover, and mutation work so well in practice. Readers interested in using genetic algorithms for some of their problems are particularly well advised to understand schemata, even if the genetic algorithms are buried inside tools they are using, since this understanding explains both the power and the limits of the technique.

A schema, which comes from the Greek word meaning form or figure, is simply a representation of the patterns present in a genome. Schemata (the plural is formed from the Greek root) are represented as sequences of symbols. The 2 s and 0s (called the fixed positions) of genomes are augmented by an asterisk, *, that matches either a 0 or a 1. The relationship between a schema and a genome is simple. A genome matches a schema when the fixed positions in the



schema match the corresponding positions in the genome. An example should make this quite clear; the following schema:

1 0 * *

matches all of the following four genomes because they all have four symbols, beginning with a 1 followed by a 0:

1 0 0 0 1 0 0 1 1 0 1 1 1 0 1 0

The order of a schema is the number of fixed positions that it contains. For instance, the order of 1*10111 is 6, of ***1010**1 is 5, and of 0************** is 1. The defining length of a schema is the distance between the outermost fixed positions. So, the defining length of 1*10111 is 6 (counting from the left, 7 - 1), of ***1010**1 is 6 (10 - 4) and of 0************** is 0 (1 - 1).

Now, let us look at fitness functions in terms of schemata. If the genome 000 survives from one generation to the next, then the schema 0** has also survived, as have *0*, **0, *00, 0*0,00*, and ***. The fitness of a particular schema, then, is the average fitness of all the genomes that match the schema in a given population. For instance, the fitness of the schema 0** is the average fitness of the genomes 000, 001, 010, and 011 since the schema survives when these genomes survive, at least considering only the selection operator. Consider two schemata from the previous example using the fitness function 31p - p2, 10*** and 00***. One genome in the initial population matches 10***, so its fitness is 176. The two genomes matching 00*** have fitness values of 87 and 58. The first schema is fitter than the second. And, in fact, in the next generation there is only one genome matching 00*** and there are two matching 10***. The fitter schema has survived and proliferated; the less fit is disappearing.

A geometric view of schemata is sometimes helpful for understanding them. Consider the eight possible genomes of length three: 000, 001, 010, 011, 100, 101, 110, and 111. These lie at the corners of a unit cube, as shown in Figure 13.3. Schemata then correspond to the edges and faces of the cube. The edges are the schemata of order 2 and the faces of order 1. As genetic algorithms are processing different genomes; they are also processing schemata, visualized by these features on a cube. The population covers pieces of the cube trying to find the corners with the best fitness, and the schemata provide information about large regions of the possible solutions. This geometric perspective generalizes to higher dimensions, where the selection, crossover, and mutation operators correspond to cuts through hypercubes in some higher-dimension space that is a bit harder to visualize



Schema for this face is **0

Schema for

this face is 1**

Schema for this edge

Figure 13.3 A cube is a useful representation of schemata on 3 bits. The corners represent the genomes, the edges represent the schemata of order 2, the faces, the schemata of order 1, and the entire cube, the schema of order 0.

Consider the schema, 1***1. This is also quite fit in the original population, with a fitness of 150. There is one genome that matches it in the original population and the same one in the next generation. This schema has survived only because the genome containing it did not cross over with another genome. A crossover would likely have destroyed it. Compare this to 10*** that survived a crossover. The shorter the defining length of a schema, the more likely it will be to survive from one generation to another. So, even longer schemata that are very fit are likely to be replaced by shorter, but fit, cousins. Using more complicated crossover techniques, such as making two cuts, changes the behavior entirely. With more complicated techniques, the defining length is no longer useful, and Hollands results on schemata do not necessarily hold.

Holland rigorously proved these two observations and summed them up in the Schema Theorem (also called the Fundamental Theorem of Genetic Algorithms): short, low-order schemata with above-average fitness increase in population from one generation to the next. In other words, short, low-order schemata are the building blocks that genetic algorithms are working on. From one generation to the next, the fittest building blocks survive and mix with each other to produce fitter and fitter genomes.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 [ 153 ] 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222