Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 [ 152 ] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222


10010

11101

00110

10110

The basis of genetic algorithms is the continual improvement of the fitness of the population by means of selection, crossover, and mutation as genes are passed from one generation to the next. After a certain number of generations- typically several dozen or hundred-the population evolves to a near-optimal solution. Genetic algorithms do not always produce the exact optimal solution, but they do a very good job of getting close to the best solution. In data mining, where exact solutions may not be feasible, being close to the best solution still yields actionable results.

Representing Data /pyf v

The previous example illustrated the basic mechanisms of applying genetic algorithms to the optimization of a simple function, 31p - p2. Since the example was trying to maximize a particular function, the function itself served as the fitness function. The genomes were quite easy to create, because the function had one parameter, a 5-bit integer that varied between 0 and 31. This genome contained a single gene, representing the parameter, and consisting of a sequence of 5 binary bits. The choice of representation using binary sequences is not accidental. As explained later in the section on schemata, genetic algorithms work best on binary representations of data-a highly convenient circumstance, since computers themselves work most efficiently on binary data.

Genetic algorithms are different from other data mining and optimization techniques in that they manipulate the patterns of bits in the genomes and do not care at all about the values represented by the bits-only the fitness function knows what the patterns really mean. One requirement for the fitness function is the ability to transform any genome into a fitness value. This requirement does not seem particularly onerous, because computers are used to working with data in bits. However, some patterns of bits may violate constraints imposed on the problem. When the genome violates such constraints, then the fitness is set to a minimum. That is, testing for constraints in the fitness function incorporates constraints into the solution.

For instance, the previous example had a constraint that the value be between 0 and 31. This was made implicitly true by using 5 bits to represent


Team-Fly®



the genome. What if there were 8 bits? In this case, the fitness function would look like:

31 - p2 when 0 <= p <= 31

0 otherwise

The general rule here is to include a minimum fitness value for any bit patterns that do not make sense or that violate problem constraints. Such patterns may not be in the original population, but they might appear because of crossover and mutation.

The fitness function is defined on the genome, a sequence of bits. It must be able to understand any pattern of 1s and 0s in the bits. When a particular pattern of bits does not make sense, then the fitness function should return a very low value-so the pattern does not get passed on to succeeding generations.

Case Study: Using Genetic Algorithms for Resource Optimization

One area where genetic algorithms have proven quite successful is in problems involving scheduling resources subject to a wide range of constraints. These types of problems involve competition for limited resources, while adhering to a complex set of rules that describe relationships. The key to these problems is defining a fitness function that incorporates all the constraints into a single fitness value. These problems are outside the range of what we have been considering as traditional data mining problems; however, they are interesting and illustrate the power of genetic algorithms.

An example of such a problem is the assignment of 40 medical residents to various duties in an outpatient clinic, as faced by Dr. Ed Ewen at the Medical Center of Delaware. The clinic is open 7 days a week, and the residents are assigned to one particular day of the week through an entire year, regardless of their other duties. The best assignment balances several different goals:

The clinic must have staff at all times.

The clinic should have a balance of first-, second-, and third-year residents.

Third-year residents see eight patients per day, second-year residents see six, and first-year residents see four.



So far, this problem is not so complicated. However, each resident spends 4 weeks on a rotation in a given part of the hospital, such as the intensive care ward, the oncology unit, or a community hospital. These rotations impose some other constraints:

Senior residents do not go to the clinic when they are assigned to the medical intensive care rotation, but all other residents do.

Junior residents do not go to the clinic when they are assigned to the cardiac care rotation, but all other residents do.

No more than two residents from the intensive care rotation can be assigned to the clinic on the same day.

No more than three residents from other rotations can be assigned to the clinic on the same day.

As an example of problems that may arise, consider that during one rotation, five residents are assigned to the clinic on a particular day. During the next rotation, the senior is on the medical intensive care rotation and the two juniors are on the cardiac care rotation. Now there are only two residents left at the clinic-and this is insufficient for clinic operations.

The genetic algorithms approach recognizes that there is probably no perfect solution to this problem, but that some assignments of residents to days of the week are clearly better than others. Dr. Ewen recognized that he could capture the goodness of a schedule using a fitness function. Actually, the function that Dr. Ewen used was an anti-fitness function-the higher the value, the worse the schedule. This function imposed penalties for violating the constraints:

For each day when the clinic has fewer than three residents, an amount is added-a larger amount the bigger the size of the deficit.

For each day when there are no seniors in the clinic, a small amount is added.

For each day when fewer than three residents are left on a rotation, a large amount is added to the fitness function.

And so on.

Setting up a spreadsheet with these functions, Dr. Ewen tried to minimize the functions to get the best assignment. His initial assignments had scores in the range of 130 to 140. After several hours of work, he was able to reduce the score to 72. Pretty good.

However, he had available a genetic algorithms package from the Ward Systems Group (www.wardsystems.com) that plugs into Excel spreadsheets. He started with a population of 100 randomly generated assignments, none of



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 [ 152 ] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222