Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 [ 64 ] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

lifetime orders

$ last 24 months

<

>

days since last

kitchen

tot $980.3 I

19.325

19.325

womens underwear


lifetime orders

food I

<

19.475

> 19.475

<

>

1 1

Figure 6.1 A binary decision tree classifies catalog recipients as likely or unlikely to place an order.

>

<

>

<

>

<

Alert readers may notice that some of the splits in the decision tree appear to make no difference. For example, nodes 17 and 18 are differentiated by the number of orders they have made that included items in the food category, but both nodes are labeled as responders. That is because although the probability of response is higher in node 18 than in node 17, in both cases it is above the threshold that has been set for classifying a record as a responder. As a classifier, the model has only two outputs, one and zero. This binary classification throws away useful information, which brings us to the next topic, using decision trees to produce scores and probabilities.



Scoring

Figure 6.2 is a picture of the same tree as in Figure 6.1, but using a different tree viewer and with settings modified so that the tree is now annotated with additional information-namely, the percentage of records in class 1 at each node.

It is now clear that the tree describes a dataset containing half responders and half nonresponders, because the root node has a proportion of 50 percent. As described in Chapter 3, this is typical of a training set for a response model with a binary target variable. Any node with more than 50 percent responders is labeled with 1 in Figure 6.1, including nodes 17 and 18. Figure 6.2 clarifies the difference between these nodes. In Node 17, 52.8 percent of records represent responders, while in Node 18, 66.9 percent do. Clearly, a record in Node 18 is more likely to represent a responder than a record in Node 17. The proportion of records in the desired class can be used as a score, which is often more useful than just the classification. For a binary outcome, a classification merely splits records into two groups. A score allows the records to be sorted from most likely to least likely to be members of the desired class.

1 50.0% Total 100.0% Node 1

37.9%

Total

100.0%

Node

46.7%

Total

100.0%

Node

35.8%

Total

100.0%

Node

.-П-

I womens underwear

44.5%

Total

100.0%

Node

I $ last 24 months

52.0%

Total

100.0%

Node

I tot % 98Q3 I

50.0%

Total

100.0%

Node

ф Ф rffi

34.0% Total 100.0% Node 11

Total Node

44.8% 100.0%

Total Node

38.3% 100.0%

55.5%

Total

100.0%

Node

67.5%

Total

100.0%

Node

I days since last

73.2%

45.3%

Total

100.0%

Total

100.0%

Node

Node

77.1%

Total

100.0%

Node

Total Node

52.8% 100.0% Total

66.9% 100.0% 18

Figure 6.2 A decision tree annotated with the proportion of records in class 1 at each node shows the probability of the classification.



For many applications, a score capable of rank-ordering a list is all that is required. This is sufficient to choose the top N percent for a mailing and to calculate lift at various depths in the list. For some applications, however, it is not sufficient to know that A is more likely to respond than B; we want to know that actual likelihood of a response from A. Assuming that the prior probability of a response is known, it can be used to calculate the probability of response from the score generated on the oversampled data used to build the tree. Alternatively, the model can be applied to preclassified data that has a distribution of responses that reflects the true population. This method, called backfitting, creates scores using the class proportions at the trees leaves to represent the probability that a record drawn from a similar population is a member of the class. These, and related issues, are discussed in detail in Chapter 3.

Estimation

Suppose the important business question is not who will respond but what will be the size of the customers next order? The decision tree can be used to answer that question too. Assuming that order amount is one of the variables available in the preclassified model set, the average order size in each leaf can be used as the estimated order size for any unclassified record that meets the criteria for that leaf. It is even possible to use a numeric target variable to build the tree; such a tree is called a regression tree. Instead of increasing the purity of a categorical variable, each split in the tree is chosen to decrease the variance in the values of the target variable within each child node.

The fact that trees can be (and sometimes are) used to estimate continuous values does not make it a good idea. A decision tree estimator can only generate as many discrete values as there are leaves in the tree. To estimate a continuous variable, it is preferable to use a continuous function. Regression models and neural network models are generally more appropriate for estimation.

Trees Grow in Many Forms

The tree in Figure 6.1 is a binary tree of nonuniform depth; that is, each nonleaf node has two children and leaves are not all at the same distance from the root. In this case, each node represents a yes-or-no question, whose answer determines by which of two paths a record proceeds to the next level of the tree. Since any multiway split can be expressed as a series of binary splits, there is no real need for trees with higher branching factors. Nevertheless, many data mining tools are capable of producing trees with more than two branches. For example, some decision tree algorithms split on categorical variables by creating a branch for each class, leading to trees with differing numbers of branches at different nodes. Figure 6.3 illustrates a tree that uses both three-way and two-way splits for the same classification problem as the tree in Figures 6.1 and 6.2.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 [ 64 ] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222