Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [ 40 ] 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

Actually, the first level of targeting does not require data mining, only data. In the United States, and to a lesser extent in many other countries, there is quite a bit of data available about a large proportion of the population. In many countries, there are companies that compile and sell household-level data on all sorts of things including income, number of children, education level, and even hobbies. Some of this data is collected from public records. Home purchases, marriages, births, and deaths are matters of public record that can be gathered from county courthouses and registries of deeds. Other data is gathered from product registration forms. Some is imputed using models. The rules governing the use of this data for marketing purposes vary from country to country. In some, data can be sold by address, but not by name. In others data may be used only for certain approved purposes. In some countries, data may be used with few restrictions, but only a limited number of households are covered. In the United States, some data, such as medical records, is completely off limits. Some data, such as credit history, can only be used for certain approved purposes. Much of the rest is unrestricted.

ИУ7Т1к11кМ The United States is unusual in both the extent of commercially available household data and the relatively few restrictions on its use. Although household data is available in many countries, the rules governing its use differ. There are especially strict rules governing transborder transfers of personal data. Before planning to use houshold data for marketing, look into its availability in your market and the legal restrictions on making use of it.

Household-level data can be used directly for a first rough cut at segmentation based on such things as income, car ownership, or presence of children. The problem is that even after the obvious filters have been applied, the remaining pool can be very large relative to the number of prospects likely to respond. Thus, a principal application of data mining to prospects is targeting-finding the prospects most likely to actually respond to an offer.

Response Modeling

Direct marketing campaigns typically have response rates measured in the single digits. Response models are used to improve response rates by identifying prospects who are more likely to respond to a direct solicitation. The most useful response models provide an actual estimate of the likelihood of response, but this is not a strict requirement. Any model that allows prospects to be ranked by likelihood of response is sufficient. Given a ranked list, direct marketers can increase the percentage of responders reached by campaigns by mailing or calling people near the top of the list.

The following sections describe several ways that model scores can be used to improve direct marketing. This discussion is independent of the data



mining techniques used to generate the scores. It is worth noting, however, that many of the data mining techniques in this book can and have been applied to response modeling.

According to the Direct Marketing Association, an industry group, a typical mailing of 100,000 pieces costs about $100,000 dollars, although the price can vary considerably depending on the complexity of the mailing. Of that, some of the costs, such as developing the creative content, preparing the artwork, and initial setup for printing, are independent of the size of the mailing. The rest of the cost varies directly with the number of pieces mailed. Mailing lists of known mail order responders or active magazine subscribers can be purchased on a price per thousand names basis. Mail shop production costs and postage are charged on a similar basis. The larger the mailing, the less important the fixed costs become. For ease of calculation, the examples in this book assume that it costs one dollar to reach one person with a direct mail campaign. This is not an unreasonable estimate, although simple mailings cost less and very fancy mailings cost more.

Optimizing Response for a Fixed Budget

The simplest way to make use of model scores is to use them to assign ranks. Once prospects have been ranked by a propensity-to-respond score, the prospect list can be sorted so that those most likely to respond are at the top of the list and those least likely to respond are at the bottom. Many modeling techniques can be used to generate response scores including regression models, decision trees, and neural networks.

Sorting a list makes sense whenever there is neither time nor budget to reach all prospects. If some people must be left out, it makes sense to leave out the ones who are least likely to respond. Not all businesses feel the need to leave out prospects. A local cable company may consider every household in its town to be a prospect and it may have the capacity to write or call every one of those households several times a year. When the marketing plan calls for making identical offers to every prospect, there is not much need for response modeling! However, data mining may still be useful for selecting the proper messages and to predict how prospects are likely to behave as customers.

A more likely scenario is that the marketing budget does not allow the same level of engagement with every prospect. Consider a company with 1 million names on its prospect list and $300,000 to spend on a marketing campaign that has a cost of one dollar per contact. This company, which we call the Simplifying Assumptions Corporation (or SAC for short), can maximize the number of responses it gets for its $300,000 expenditure by scoring the prospect list with a response model and sending its offer to the prospects with the top 300,000 scores. The effect of this action is illustrated in Figure 4.2.



ROC CURVES

Models are used to produce scores. When a cutoff score is used to decide which customers to include in a campaign, the customers are, in effect, being classified into two groups-those likely to respond, and those not likely to respond. One way of evaluating a classification rule is to examine its error rates. In a binary classification task, the overall misclassification rate has two components, the false positive rate, and the false negative rate. Changing the cutoff score changes the proportion of the two types of error. For a response model where a higher score indicates a higher liklihood to respond, choosing a high score as the cutoff means fewer false positive (people labled as responders who do not respond) and more false negatives (people labled as nonresponders who would respond).

An ROC curve is used to represent the relationship of the false-positive rate to the false-negative rate of a test as the cutoff score varies. The letters ROC stand for Receiver Operating Characteristics a name that goes back to the curves origins in World War II when it was developed to assess the ability of radar operators to identify correctly a blip on the radar screen , whether the blip was an enemy ship or something harmless. Today, ROC curves are more likely to used by medical researchers to evaluate medical tests. The false positive rate is plotted on the X-axis and one minus the false negative rate is plotted on the Y-axis. The ROC curve in the following figure





1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [ 40 ] 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222