Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [ 27 ] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

Classification

Estimation

Prediction

Affinity Grouping

Clustering

Description and Profiling

These are the tasks that can be accomplished with the data mining techniques described in this book (though no single data mining tool or technique is equally applicable to all tasks).

The first three tasks, classification, estimation, and prediction are examples of directed data mining. Affinity grouping and clustering are examples of undirected data mining. Profiling may be either directed or undirected. In directed data mining there is always a target variable-something to be classified, estimated, or predicted. The process of building a classifier starts with a predefined set of classes and examples of records that have already been correctly classified. Similarly, the process of building an estimator starts with historical data where the values of the target variable are already known. The modeling task is to find rules that explain the known values of the target variable.

In undirected data mining, there is no target variable. The data mining task is to find overall patterns that are not tied to any one variable. The most common form of undirected data mining is clustering, which finds groups of similar records without any instructions about which variables should be considered as most important. Undirected data mining is descriptive by nature, so undirected data mining techniques are often used for profiling, but directed techniques such as decision trees are also very useful for building profiles. In the machine learning literature, directed data mining is called supervised learning and undirected data mining is called unsupervised learning.

How Will the Results Be Used?

This is one of the most important questions to ask when deciding how best to translate a business problem into a data mining problem. Surprisingly often, the initial answer is were not sure. An answer is important because, as the cautionary tale in the sidebar illustrates, different intended uses dictate different solutions.

For example, many of our data mining engagements are designed to improve customer retention. The results of such a study could be used in any of the following ways:

Proactively contact high risk/high value customers with an offer that rewards them for staying.



Change the mix of acquisition channels to favor those that bring in the most loyal customers.

Forecast customer population in future months.

Alter the product to address defects that are causing customers to defect.

Each of these goals has implications for the data mining process. Contacting existing customers through an outbound telemarketing or direct mail campaign implies that in addition to identifying customers at risk, there is an understanding of why they are at risk so an attractive offer can be constructed, and when they are at risk so the call is not made too early or too late. Forecasting implies that in addition to identifying which current customers are likely to leave, it is possible to determine how many new customers will be added and how long they are likely to stay. This latter problem of forecasting new customer starts is typically embedded in business goals and budgets, and is not usually a predictive modeling problem.

How Will the Results Be Delivered?

A data mining project may result in several very different types of deliverables. When the primary goal of the project is to gain insight, the deliverable is often a report or presentation filled with charts and graphs. When the project is a one-time proof-of-concept or pilot project, the deliverable may consist of lists of customers who will receive different treatments in a marketing experiment. When the data mining project is part of an ongoing analytic customer relationship management effort, the deliverable is likely to be a computer program or set of programs that can be run on a regular basis to score a defined subset of the customer population along with additional software to manage models and scores over time. The form of the deliverable can affect the data mining results. Producing a list of customers for a marketing test is not sufficient if the goal is to dazzle marketing managers.

The Role of Business Users and Information Technology

As described in Chapter 2, the only way to get good answers to the questions posed above is to involve the owners of the business problem in figuring out how data mining results will be used and IT staff and database administrators in figuring out how the results should be delivered. It is often useful to get input from a broad spectrum within the organization and, where appropriate, outside it as well. We suggest getting representatives from the various constituencies within the enterprise together in one place, rather than interviewing them separately. That way, people with different areas of knowledge and expertise have a chance to react to each others ideas. The goal of all this consultation is a clear statement of the business problem to be addressed. The final



MISUNDERSTANDING THE BUSINESS PROBLEM: A CAUTIONARY TALE

Data Miners, the consultancy started by the authors, was once called upon to analyze supermarket loyalty card data on behalf of a large consumer packaged goods manufacturer. To put this story in context, it helps to know a little bit about the supermarket business. In general, a supermarket does not care whether a customer buys Coke or Pepsi (unless one brand happens to be on a special deal that temporarily gives it a better margin), so long as the customer purchases soft drinks. Product manufacturers, who care very much which brands are sold, vie for the opportunity to manage whole categories in the stores. As category managers, they have some control over how their own products and those of their competitors are merchandised. Our client wanted to demonstrate its ability to utilize loyalty card data to improve category management. The category picked for the demonstration was yogurt because by supermarket standards, yogurt is a fairly high-margin product.

As we understood it, the business goal was to identify yogurt lovers. To create a target variable, we divided loyalty card customers into groups of high, medium, and low yogurt affinity based on their total yogurt purchases over the course of a year and into groups of high, medium, and low users based on the proportion of their shopping dollars spent on yogurt. People who were in the high category by both measures were labeled as yogurt lovers.

The transaction data had to undergo many transformations to be turned into a customer signature. Input variables included the proportion of trips and of dollars spent at various times of day and in various categories, shopping frequency, average order size, and other behavioral variables.

Using this data, we built a model that gave all customers a yogurt lover score. Armed with such a score, it would be possible to print coupons for yogurt when likely yogurt lovers checked out, even if they did not purchase any yogurt on that trip. The model might even identify good prospects who had not yet gotten in touch with their inner yogurt lover, but might if prompted with a coupon.

The model got good lift, and we were pleased with it. The client, however, was disappointed. But, who is the yogurt lover? asked the client. Someone who gets a high score from the yogurt lover model was not considered a good answer. The client was looking for something like The yogurt lover is a woman between the ages of X and Y living in a zip code where the median home price is between M and N. A description like that could be used for deciding where to buy advertising and how to shape the creative content of ads. Ours, based on shopping behavior rather than demographics, could not.

statement of the business problem should be as specific as possible. Identify the 10,000 gold-level customers most likely to defect within the next 60 days is better than provide a churn score for all customers.

The role of the data miner in these discussions is to ensure that the final statement of the business problem is one that can be translated into a data mining problem. Otherwise, the best data mining efforts in the world may be addressing the wrong business problem.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [ 27 ] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222