Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 [ 185 ] 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

or item to be studied. This means transformations that flatten product hierarchies so that, for example, the same transaction might generate one flag indicating that the customer bought French wine, another that he or she bought a wine from the Burgundy region, and a third indicating that the wine was from the Beaujolais district in Burgundy. Other data must be rolled up from order files, billing files, and session logs that contain multiple transactions per customer. Typical values derived this way include total spending by category, average order amount, difference between this customers average order and the mean average order, and the number of days since the customer last made a purchase.

Reporting is done from a multidimensional database that allows retrospective queries at various levels. Data mining and OLAP are both part of the analysis module, although they answer different kinds of questions. OLAP queries are used to answer questions such as these:

What are the top-selling products?

What are the worst-selling products?

What are the top pages viewed?

What are conversion rates by brand name?

What are the top referring sites by visit count?

What are the top referring sites by dollar sales?

How many customers abandoned market baskets?

Data mining is used to answer more complicated questions such as these:

What are the characteristics of heavy spenders? Does this user fit the profile?

What promotion should be offered to this customer?

What is the likelihood that this customer will return within 1 month?

What customers should we worry about because they havent visited the site recently?

Which products are associated with customers who spend the most money?

Which products are driving sales of which other products?

In Figure 16.2, the arrow labeled build data warehouse connects the customer interaction module to the analysis module and represents all the transformations that must occur before either data mining or reporting can be done properly. Two more arrows, labeled deploy results, show the output of the analysis module being shipped back to the business data definition and customer interaction modules. Yet another arrow, labeled stage data, shows how the business rules embedded in the business definition module feed into the customer interacting module.



What is appealing about this architecture is the way that it facilitates the virtuous cycle of data mining by allowing new knowledge discovered through data mining to be fed directly to the systems that interact with customers.

Data Mining Software

One of the ways that the data mining world has changed most since the first edition of this book came out is the maturity of data mining software products. Robustness, usability, and scalability have all improved significantly. The one thing that may have decreased is the number of data mining software vendors as tiny boutique software firms have been pushed aside by larger, more established companies. As stated in the first edition, it is not reasonable to compare the merits of particular products in a book intended to remain useful beyond the shelf-life of the current versions of these products. Although the products are changing-and hopefully improving-over time, the criteria for evaluating them have not changed: Price, availability, scalability, support, vendor relationships, compatibility, and ease of integration all factor into the selection process.

Range of Techniques

As must be clear by now, there is no single data mining technique that is applicable in all situations. Neural networks, decision trees, market basket analysis, statistics, survival analysis, genetic algorithms, memory-based reasoning, link analysis, and automatic cluster detection all have a place. As shown in the case studies, it is not uncommon for two or more of these techniques to be applied in combination to achieve results beyond the reach of any single method.

Be sure that the software selected is powerful enough to support the data and goals needed for the organization. It is a good idea to have software a bit more advanced than the analysts abilities, so people can try out new things that they might not otherwise think of trying. Having multiple techniques available in a single set of tools is useful, because it makes it easier to combine and compare different techniques. At the same time, having several different products makes sense for a larger group, since different products have different strengths-even when they support the same underlying functionality. Some are better at presenting results; some are better at developing scores; some are more intuitive for novice users.

Assess the range of data mining tasks to be addressed and decide which data mining techniques will be most valuable. If you have a single application in mind, or a family of closely related applications, then it is likely that you


Team-Fly®



QUESTIONS TO ASK WHEN SELECTING DATA MINING SOFTWARE

The following list of questions is designedto help select the right data mining software for your company. We present the questions as an unordered list. The first thing you should do is order the list according to your own priorities. These priorities will necessarily be different from case to case, which is why we have not attempted to rank them for you. In some environments, for example, there is an established standard hardware supplier and platform-independence is not an issue, while in other environments it is of paramount concern so different divisions can use the package or in anticipation of a future change in hardware.

♦ What is the range of data mining techniques offered by the vendor?

♦ How scalable is the product in terms of the size of the data, the number of users, the number of fields in the data, and its use of the hardware?

♦ Does the product provide transparent access to databases and files?

♦ Does the product provide multiple levels of user interfaces?

♦ Does the product generate comprehensible explanations of the models it generates?

♦ Does the product support graphics, visualization, and reporting tools?

♦ Does the product interact well with other software in the environment, such as reporting packages, databases, and so on?

♦ Can the product handle diverse data types?

♦ Is the product well documented and easy to use?

♦ What is the availability of support, training, and consulting?

♦ How well will the product fit into the existing computing environment?

♦ Does the vendor have credible references?

Once you have determined which of these questions are most important to your organization, use them to assess candidate software packages by interviewing the software vendors or by enlisting the aid of an independent data mining consultant.

will be able to select a single technique and stick with it. If you are setting up a data mining lab environment to handle a wide range of data mining applications, you will want to look for a coordinated suite of tools.

Scalability

Data mining provides the greatest benefit when the data to be mined is large and complex. But, data mining software is likely to be demonstrated on small, sample datasets. Be sure that the data mining software being considered can handle the anticipated data volume-and then perhaps a bit more to take into



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 [ 185 ] 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222