Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 [ 88 ] 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

be mapped to -1.0, -0.5, 0.0, +0.5, +1.0, respectively. From the perspective of the network, single and unknown are very far apart, whereas divorced and married are quite close. For some input fields, this implicit ordering might not have much of an effect. In other cases, the values have some relationship to each other and the implicit ordering confuses the network.

WARNING

When working with categorical variables in neural networks, be very careful when mapping the variables to numbers. The mapping introduces an ordering of the variables, which the neural network takes into account, even if the ordering does not make any sense.

The second way of handling categorical features is to break the categories into flags, one for each value. Assume that there are three values for gender (male, female, and unknown). Table 7.3 shows how three flags can be used to code these values using a method called 1 of N Coding. It is possible to reduce the number of flags by eliminated the flag for the unknown gender; this approach is called 1 of N -1 Coding.

Why would we want to do this? We have now multiplied the number of input variables and this is generally a bad thing for a neural network. However, these coding schemes are the only way to eliminate implicit ordering among the values.

The third way is to replace the code itself with numerical data about the code. Instead of including zip codes in a model, for instance, include various census fields, such as the median income or the proportion of households with children. Another possibility is to include historical information summarized at the level of the categorical variable. An example would be including the historical churn rate by zip code for a model that is predicting churn.

DJJ When using categorical variables in a neural network, try to replace them with some numeric variable that describes them, such as the average income in a census block, the proportion of customers in a zip code (penetration), the historical churn rate for a handset, or the base cost of a pricing plan.

Table 7.3 Handling Gender Using 1 of N Coding and 1 of N - 1 Coding

GENDER

GENDER

MALE FLAG

GENDER FEMALE FLAG

GENDER

UNKNOWN FLAG

GENDER

MALE

FLAG

GENDER FEMALE FLAG

Male

+ 1.0

-1.0

-1.0

+1.0

-1.0

Female

-1.0

+1.0

-1.0

-1.0

+1.0

Unknown

-1.0

-1.0

+1.0

-1.0

-1.0



Other Types of Features

Some input features might not fit directly into any of these three categories. For complicated features, it is necessary to extract meaningful information and use one of the above techniques to represent the result. Remember, the input to a neural network consists of inputs whose values should generally fall between -1 and 1.

Dates are a good example of data that you may want to handle in special ways. Any date or time can be represented as the number of days or seconds since a fixed point in time, allowing them to be mapped and fed directly into the network. However, if the date is for a transaction, then the day of the week and month of the year may be more important than the actual date. For instance, the month would be important for detecting seasonal trends in data. You might want to extract this information from the date and feed it into the network instead of, or in addition to, the actual date.

The address field-or any text field-is similarly complicated. Generally, addresses are useless to feed into a network, even if you could figure out a good way to map the entire field into a single value. However, the address may contain a zip code, city name, state, and apartment number. All of these may be useful features, even though the address field taken as a whole is usually useless.

Interpreting the Results

Neural network tools take the work out of interpreting the results. When estimating a continuous value, often the output needs to be scaled back to the correct range. For instance, the network might be used to calculate the value of a house and, in the training set, the output value is set up so that $103,000 maps to -1 and $250,000 maps to 1. If the model is later applied to another house and the output is 0.0, then we can figure out that this corresponds to $176,500- halfway between the minimum and the maximum values. This inverse transformation makes neural networks particularly easy to use for estimating continuous values. Often, though, this step is not necessary, particularly when the output layer is using a linear transfer function.

For binary or categorical output variables, the approach is still to take the inverse of the transformation used for training the network. So, if churn is given a value of 1 and no-churn a value of -1, then values near 1 represent churn, and those near -1 represent no churn. When there are two outcomes, the meaning of the output depends on the training set used to train the network. Because the network learns to minimize the error, the average value produced by the network during training is usually going to be close to the average value in the training set. One way to think of this is that the first



pattern the network finds is the average value. So, if the original training set had 50 percent churn and 50 percent no-churn, then the average value the network will produce for the training set examples is going to be close to 0.0. Values higher than 0.0 are more like churn and those less than 0.0, less like churn. If the original training set had 10 percent churn, then the cutoff would more reasonably be -0.8 rather than 0.0 (-0.8 is 10 percent of the way from -1 to 1). So, the output of the network does look a lot like a probability in this case. However, the probability depends on the distribution of the output variable in the training set.

Yet another approach is to assign a confidence level along with the value. This confidence level would treat the actual output of the network as a propensity to churn, as shown in Table 7.4.

For binary values, it is also possible to create a network that produces two outputs, one for each value. In this case, each output represents the strength of evidence that that category is the correct one. The chosen category would then be the one with the higher value, with confidence based on some function of the strengths of the two outputs. This approach is particularly valuable when the two outcomes are not exclusive.

TIP Because neural networks produce continuous values, the output from a network can be difficult to interpret for categorical results (used in classification). The best way to calibrate the output is to run the network over a validation set, entirely separate from the training set, and to use the results from the validation set to calibrate the output of the network to categories. In many cases, the network can have a separate output for each category; that is, a propensity for that category. Even with separate outputs, the validation set is still needed to calibrate the outputs.

Table 7.4 Categories and Confidence Levels for NN Output

OUTPUT VALUE

CATEGORY

CONFIDENCE

-1.0

100%

-0.6

-0.02

+0.02

+0.6

+1.0

100%

Team-Fly®



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 [ 88 ] 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222