Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 [ 155 ] 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

Response cards included in the in-flight magazine

Comment forms on the airlines Web site

Telephone calls to the customer service center

Cards, letters, and email messages

Different comments have different priorities for responses. Compliments, for example, may result in an automated thank you for being a loyal customer type of message. On the other hand, all complaints need at least to be acknowledged, and many complaints require follow-up action. The sooner the company responds, the better the chance of keeping a perhaps valuable, but disgruntled, customer.

Airline personnel spend significant amounts of time analyzing customer comments, first sorting them into complaints and other comments, and then routing the complaints to the appropriate group for follow-up. When customers are already upset about lost baggage, canceled flights, rude treatment, or lousy food, a slow or inappropriate response only makes things worse. This particular airline decided to reduce the time it took to respond to a complaint by automating the initial categorization of comments. Their approach evolved a solution using software from Genalytics (www.genalytics.com), a software company in Newburyport, MA.

Data

All customer comments end up in a comment database, regardless of the channel they come in by. This database includes both fixed fields describing the comment and the actual text itself. A complete customer comment record has the following fields:

Date

Source (email, comment card, telephone contact, letter, other)

Flight number

Class of service

Departure airport

Destination airport

Mileage account number

Organization receiving comment

Names of involved airline employee(s) if mentioned

Free-text comments

Some records are missing data for some fields. Comments coming in through the call center are usually filled in correctly, because the call-center reps are



trained to fill in all the fields. However, left to themselves, customers may not fill in all the fields when sending a comment card or email message.

The first step is preprocessing the text. The company preprocesses the comments to correct certain spelling errors and to create a large number of derived variables about the content (is the word food present? is the word meal present? and so on). Such derived variables are created for every word in the database that occurs more than some threshold number of times across all messages and is not a very common word such as of or the. Some of the new variables convey metadata about the comment, such as its size in bytes and the number of distinct words in contains. Together, these variables form the comment header. The comment itself is not used, instead the various derived variables are used.

The Data Mining Task: Evolving a Solution

The data mining task was to come up with a model that takes as input a large number of variables describing each customer comment and somehow combine them to come up with a classification. The specific task was to classify comment signatures based on whether or not they are complaints. There are several ways of approaching this, such as using decision trees or clustering. In this case, though, the company evolved a solution.

Solving a problem with genetic algorithms requires genomes and a fitness function. The genomes are based on the preprocessed comments, one genome per comment. First, a few more fields are added for interaction variables, such as whether both baggage and JFK are mentioned or whether both food and chicken are mentioned. The header, metadata variables and interaction variables form the comment signature, as shown in Figure 13.5.


To: comments@airline.com From: random customer

My baggage was lost at JFK when I changed planes ,

\ ч 4 4 <У <У <y <y < ~~

0cf 0cf 0cf 0cf 0cf

Comment Header

Indicator Variables

Interaction Variables

Figure 13.5 The comment signature describes the text in the comment.

Team-Ffy®



The comment signature is not the genome, but it is related to it. Instead, the genome is a set of weights corresponding to each variable in the signature (along with an additional weight called a bias). It is possible to multiply the weights in the genome times the corresponding fields in the comment signature to obtain a prediction for the comment being a complaint, as shown in Figure 13.6. This is the fitness function for a single comment signature. The full fitness function applies this to all the comment signatures in the training set.

The Genalytics System creates a random population of genomes. These genomes generally have most of the weights set to low values, and just a few set to high values. That is, the initial population consists of genomes that are specialized for the simplest features in the comment signature. Although the initial population performed very poorly, its use of selection, crossover, and mutation lead to better and better solutions. After tens of thousands of generations, the final model was able to classify 85 percent of the records correctly- enough to speed up the airlines complaint processing. The chart in Figure 13.7 shows the improvement in the fitness function in succeeding generations.

f l4cP

.4°° -

ч ч 4

-


Figure 13.6 The genome has a weight for each field in the comment signature, plus an additional weight called a bias.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 [ 155 ] 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222