Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [ 22 ] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

smaller group of likely buyers. The new approach proved much more effective than the first.

Completing the Cycle

The university-based data mining project showed that even with only a limited number of broad-brush variables to work with and fairly primitive data mining tools, data mining could improve the effectiveness of a direct marketing campaign for a big-ticket item like an automobile. The next step is to gather more data, build better models, and try again!

Lessons Learned

This chapter started by recalling the drivers of the industrial revolution and the creation of large mills in England and New England. These mills are now abandoned, torn down, or converted to other uses. Water is no longer the driving force of business. It has been replaced by data.

The virtuous cycle of data mining is about harnessing the power of data and transforming it into actionable business results. Just as water once turned the wheels that drove machines throughout a mill, data needs to be gathered and disseminated throughout an organization to provide value. If data is water in this analogy, then data mining is the wheel, and the virtuous cycle spreads the power of the data to all the business processes.

The virtuous cycle of data mining is a learning process based on customer data. It starts by identifying the right business opportunities for data mining. The best business opportunities are those that will be acted upon. Without action, there is little or no value to be gained from learning about customers.

Also very important is measuring the results of the action. This completes the loop of the virtuous cycle, and often suggests further data mining opportunities.


Team-Fly®




Data Mining Methodology and Best Practices

The preceding chapter introduced the virtuous cycle of data mining as a business process. That discussion divided the data mining process into four stages:

1. Identifying the problem

2. Transforming data into information

3. Taking action

4. Measuring the outcome

Now it is time to start looking at data mining as a technical process. The high-level outline remains the same, but the emphasis shifts. Instead of identifying a business problem, we now turn our attention to translating business problems into data mining problems. The topic of transforming data into information is expanded into several topics including hypothesis testing, profiling, and predictive modeling. In this chapter, taking action refers to technical actions such as model deployment and scoring. Measurement refers to the testing that must be done to assess a models stability and effectiveness before it is used to guide marketing actions.

Because the entire book is based on this methodology, the best practices introduced here are elaborated upon elsewhere. The purpose of this chapter is to bring them together in one place and to organize them into a methodology.

The best way to avoid breaking the virtuous cycle of data mining is to understand the ways it is likely to fail and take preventative steps. Over the



years, the authors have encountered many ways for data mining projects to go wrong. In response, we have developed a useful collection of habits-things we do to smooth the path from the initial statement of a business problem to a stable model that produces actionable and measurable results. This chapter presents this collection of best practices as the orderly steps of a data mining methodology. Dont be fooled-data mining is a naturally iterative process. Some steps need to be repeated several times, but none should be skipped entirely.

The need for a rigorous approach to data mining increases with the complexity of the data mining approach. After establishing the need for a methodology by describing various ways that data mining efforts can fail in the absence of one, the chapter starts with the simplest approach to data mining- using ad hoc queries to test hypotheses-and works up to more sophisticated activities such as building formal profiles that can be used as scoring models and building true predictive models. Finally, the four steps of the virtuous cycle are translated into an 11-step data mining methodology.

Why Have a Methodology?

Data mining is a way of learning from the past so as to make better decisions in the future. The best practices described in this chapter are designed to avoid two undesirable outcomes of the learning process:

Learning things that arent true

Learning things that are true, but not useful

These pitfalls are like the rocks of Scylla and the whirlpool of Charybdis that protect the narrow straits between Sicily and the Italian mainland. Like the ancient sailors who learned to avoid these threats, data miners need to know how to avoid common dangers.

Learning Things That Arent True

Learning things that arent true is more dangerous than learning things that are useless because important business decisions may be made based on incorrect information. Data mining results often seem reliable because they are based on actual data in a seemingly scientific manner. This appearance of reliability can be deceiving. The data itself may be incorrect or not relevant to the question at hand. The patterns discovered may reflect past business decisions or nothing at all. Data transformations such as summarization may have destroyed or hidden important information. The following sections discuss some of the more common problems that can lead to false conclusions.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [ 22 ] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222