Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 [ 183 ] 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

Building a Data Mining Group in the Business Units

The alternative to putting the data mining group where the data and computers are is to put it close to the problems being addressed. That generally means the marketing group, the customer relationship management group (where such a thing exists), or the finance group. Sometimes there are several small data mining groups, one in each of several business units. A group in finance building credit risk models and collections models, one in marketing building response models, and one in CRM building cross-sell models and voluntary churn models.

The advantages and disadvantages of this approach are the inverse of those for putting data mining in IT. The business units have a great understanding of their own business problems, but may still have to rely on IT for data and computing resources. Although either approach can be successful, on balance we prefer to see data mining centered in the business units.

What to Look for in Data Mining Staff

The best data mining groups are often eclectic mixes of people. Because data mining has not existed very long as a separately named activity, there are few people who can claim to be trained data miners. There are data miners who used to be physicists, data miners who used to be geologists, data miners who used to be computer scientists, data miners who used to be marketing managers, data miners who used to be linguists, and data miners who are still statisticians.

This makes lunchtime conversation in a data mining group fairly interesting, but it doesnt offer much guidance for hiring managers. The things that make good data miners better than mediocre ones are hard to teach and impossible to automate: good intuition, a feel for how to coax information out of data, and a natural curiosity.

No one indivdiual is likely to have all the skills required for completing a data mining project. Among them, the team members should cover the following:

Database skills (SQL, if the data is stored in relational databases)

Data transformation and programming skills (SAS, SPSS, S-Plus, PERL, other programming languages, ETL tools)

Statistics

Machine learning skills

Industry knowledge in the relevant industry

Data visualization skills

Interviewing and requirements-gathering skills

Presentation, writing, and communication skills



A new data mining group should include someone who has done commercial data mining before-preferably in the same industry. If necessary, this expertise can be provided by outside consultants.

Data Mining Infrastructure

In companies where data mining is merely an exploratory activity, useful data mining can be accomplished with little infrastructure. A desktop workstation with some data mining software and access to the corporate databases is likely to be sufficient. However, when data mining is central to the business, the data mining infrastructure must be considerably more robust. In these companies, updating customer profiles with new model scores either on a regular schedule such as once a month or, in some cases with each new transaction, is part of the regular production process of the data warehouse. The data mining infrastructure must provide a bridge between the exploratory world where models are developed and the production world where models are scored and marketing campaigns run.

A production-ready data mining environment must be able to support the following:

The ability to access data from many sources and bring the data together as customer signatures in a data mining model set.

The ability to score customers using already created models from the model library on demand.

The ability to manage hundreds of model scores over time.

The ability to manage scores or hundreds of models developed over time.

The ability to reconstruct a customer signature for any point in a customers tenure, such as immediately before a purchase or other interesting event.

The ability to track changes in model scores over time.

The ability to publish scores, rules, and other data mining results back to the data warehouse and to other applications that need them.

The data mining infrastructure is logically (and often physically) split into two pieces supporting two quite different activities: mining and scoring. Each task presents a different set of requirements.



The Mining Platform

The mining platform supports software for data manipulation along with data mining software embodying the data mining techniques described in this book, visualization and presentation software, and software to enable models to be published to the scoring environment.

Although we have already touched on a few integration issues, others to consider include:

Where in the client/server hierarchy is the software to be installed?

Will the data mining software require its own hardware platform? If so, will this introduce a new operating system into the mix?

What software will have to be installed on users desktops in order to communicate with the package?

What additional networking, SQL gateways, and middleware will be required?

Does the data mining software provide good interfaces to reporting and graphics packages?

The purpose of the mining platform is to support exploration of the data, mining, and modeling. The system should be devised with these activities in mind, including the fact that such work requires much processing and computing power. The data mining software vendor should be able to provide specifications for a data mining platform adequate for the anticipated dataset sizes and expected usage patterns.

The Scoring Platform

The scoring platform is where models developed on the mining platform are applied to customer records to create scores used to determine future treatments. Often, the scoring platform is the customer database itself, which is likely to be a relational database running on a parallel hardware platform.

In order to score a record, the record must contain, or the scoring platform must be able to calculate, the same features that went into the model. These features used by the model are rarely in the raw form in which they occur in the data. Often, new features have been created by combining existing variables in various ways, such as taking the ratio of one to another and performing transformations such as binning, summing, and averaging. Whatever was done to calculate the features used when the model was created must now be done for every record to be scored. Since there may be hundreds of millions of transactional records, it matters how this is done. When the volume of data is large, so is the data processing challenge.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 [ 183 ] 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222