Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 [ 178 ] 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

graph. Neural networks can train on millions of records at a time. And, even though the algorithms often work on summaries of the detailed transactions (especially at the customer level), what gets summarized can change from one run to the next. Prebuilding the summaries and discarding the transaction data locks you into only one view of the business. Often the first result from using such summaries is a request for some variation on them.

Consistent, Clean Data

Data mining algorithms are often applied to gigabytes of data combined from several different sources. Much of the work in looking for actionable information actually takes place when bringing the data together-often 80 percent or more of the time allocated to a data mining project is spent bringing the data together-especially when a data warehouse is not available. Subsequent problems, such as matching account numbers, interpreting codes, and house-holding, further delay the analysis. Finding interesting patterns is often an iterative process that requires going back to the data to get additional data elements. Finally, when interesting patterns are found, it is often necessary to repeat the process on the most recent data available.

A well-designed and well-built data warehouse can help solve these problems. Data is cleaned once, when it is loaded into the data warehouse. The meaning of fields is well defined and available through the metadata. Incorporating new data into analyses is as easy as finding out what data is available through the metadata and retrieving it from the warehouse. A particular analysis can be reapplied on more recent data, since the warehouse is kept up to date. The end result is that the data is cleaner and more available-and that the analysts can spend more time applying powerful tools and insights instead of moving data and pushing bytes.

Hypothesis Testing and Measurement

The data warehouse facilitates two other areas of data mining. Hypothesis testing is the verification of educated guesses about patterns in the data. Do tropical colors really sell better in Florida than elsewhere? Do people tend to make long-distance calls after dinner? Are the users of credit cards at restaurants really high-end customers? All of these questions can be expressed rather easily as queries on the appropriate relational database. Having the data available makes it possible to ask questions and find out quickly what the answers are.

ЦГр The ability to test hypotheses and ideas is a very important aspect of data mining. By bringing the data together in one place, data warehouses enable answering in-depth, complicated questions. One caveat is that such queries can be expensive to run, falling into the killer query category.



Measurement is the other area where data warehouses have proven to be very valuable. Often when marketing efforts, product improvements, and so forth take place, there is limited feedback on the degree of success achieved. A data warehouse makes it possible to see the results and to find related effects. Did sales of other products improve? Did customer attrition increase? Did calls to customer service decrease? And so on. Having the data available makes it possible to understand the effects of an action, whether the action was spurred by data mining results or by something else.

Of particular value in terms of measurement is the effect of various marketing actions on the longer-term customer relationship. Often, marketing campaigns are measured in terms of response. While response is clearly a dimension of interest, it is only one. The longer term behavior of customers is also of interest. Did an acquisition campaign bring in good customers or did the newly acquired customers leave before they even paid? Did an upsell campaign stick, or did customers return to their previous products? Measurement enables an organization to learn from its mistakes and to build on its successes.

Scalable Hardware and RDBMS Support

The final synergy between data mining and data warehousing is on the systems level. The same scalable hardware and software that makes it possible to store and query large databases provides a good system for analyzing data. Chapter 17 talks about building the customer signature. Often, the best place to build the signature is in the central repository or, failing that, in a data mart with similar amounts of data.

There is also the question of running data mining algorithms in parallel, taking further advantage of the powerful machines. This is often not necessary, because actually building models represents a small part of the time devoted to data mining-preparing the data and understanding the results are much more important. Databases, such as Oracle and Microsoft SQL Server, are increasingly providing support for data mining algorithms, which enables such algorithms to run in parallel.

Lessons Learned

Data warehousing is not a system but a process that can greatly benefit data mining and data analysis efforts. From the perspective of data mining, the most important functionality is the ability to recreate accurate snapshots of history. Another very important facet is support for ad hoc reporting. In order to learn from data, you need to know what really happened.



A typical data warehousing system contains the following components:

The source systems provide the input into the data warehouse.

The extraction , transformation, and load tools clean the data and apply business rules so that new data is compatible with historical data.

The central repository is a relational database specifically designed to be a decision-support system of record.

The data marts provide the interface to different varieties of users with different needs.

The metadata repository informs users and developers about what is inside the data warehouse.

One of the challenges in data warehousing is the massive amount of data that must be stored, particularly if the goal is to keep all customer interactions. Fortunately, computers are sufficiently powerful that the question is more about budget than possibility. Relational databases can also take advantage of the most powerful hardware, parallel computers.

Online Analytic Processing (OLAP) is a powerful part of data warehousing. OLAP tools are very good at handling summarized data, allowing users summarize information along one or several dimensions at one time. Because these systems are optimized for user reporting, they often have interactive response times of less than 5 seconds.

Any well-designed OLAP system has time as a dimension, making it very useful for seeing trends over time. Trying to accomplish the same thing on a normalized data warehouse requires very complicated queries that are prone to error. To be most useful, OLAP systems should allow users to drill down to detail data for all reports. This capability ensures that all data is making it into the cubes, as well as giving users the ability to spot important patterns that may not appear in the dimensions.

As we have pointed out throughout this chapter, OLAP complements data mining. It is not a substitute for it. It provides better understanding of data, and the dimensions developed for OLAP can make data mining results more actionable. However, OLAP does not automatically find patterns in data.

OLAP is a powerful way to distribute information to many end users for advanced reporting needs. It provides the ability to let many more users base their decisions on data, instead of on hunches, educated guesses, and personal experience. OLAP complements undirected data mining techniques such as clustering. OLAP can provide the insight needed to find the business value in the identified clusters. It also provides a good visualization tool to use with other methods, such as decision trees and memory-based reasoning.

Data warehousing and data mining are not the same thing; however, they do complement each other, and data mining applications are often part of the data warehouse solution.

Team-Fly®



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 [ 178 ] 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222