Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 [ 167 ] 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

amounts to close to 3,000 searches per second!)-with peaks several times higher.

Because of the large data volumes, there is often a reluctance to store transaction-level data in a data warehouse. From the perspective of data mining, this is a shame, since the transactions best describe customer behavior.

Operational Summary Data

Operational summaries play the same role as transactions; the difference being that operational summaries are derived from transactions. The most common examples are billing systems, which summarize transactions, usually into monthly or four-week bill cycles. These summaries are customer-facing and often result in other transactions, such as bill payments. In some cases, operational summaries may include fields that are summarized to enhance the companys understanding of its customers rather than for operational purposes. For instance, Chapter 4 described how AT&T used call detail records to calculate a bizocity score, indicating how businesslike a telephone numbers calling pattern appears. The records of each call are discarded, but the score is kept up to date.

There is a distinction between operational summary data and transaction data, because summaries are for a period of time and transactions represent events. Consider the amount paid by a subscription customer. In a billing system, amount paid is a summary for the billing period. A payment history table instead provides detail on every payment transaction. For most customers, the monthly summary and payment transactions are very similar. However, two payments might arrive during the same billing period. The more detailed payment information might be useful for insight into customer payment patterns.

Decision-Support Summary Data

Decision-support summary data is the data used for making decisions about the business. The financial data used to run a company provides an example of decision-support summary data; this is often considered to be the cleanest data for decision making. Another example is the data warehouses and data marts whose purpose is to provide a decision-support system of record at the customer level. Maintaining decision-support summary data is the purpose of the data warehouse.

Generally, it is a bad idea to use the same system for analytic and operational purposes, since operational purposes need to take precedence, resulting in a system that is optimized for operations and not decision support. Financial systems are not generally designed for understanding customers, because they are designed for accounting purposes. Making customer summaries balance exactly to the general ledger is highly complex and usually not worth the



effort. One of the goals of data warehousing is to provide consistent definitions and layouts so similar reports produce similar results, no matter which business user is producing them or when they are produced. This chapter is mostly concerned with this level of abstraction.

In one sense, summaries seem to destroy information as they aggregate things. For this reason, different summaries are useful for different purposes. Point-of-sale transactions may capture every can of sardines that goes over the scanner, but only summaries begin to describe the shoppers behavior in terms of her habitual time of day to shop and the proportion of her dollars spent in the canned food department. In this case, the customer summary seems to be creating information.

MvflTililliM Do not expect customer-level data warehouse information to balance exactly against financial systems (although the two systems should be close). Although theoretically possible, such balancing can prove very difficult and distract from the purpose of the data warehouse.

Database Schema

So far, the discussion has been on data. The structure of data is also important- what data is stored, where it is stored, what is not stored, and so on. The sidebar What is a relational database? explains the key ideas behind relational databases, the most common systems for storing large amounts of data.

No matter how the data is stored, it is important to distinguish between two ways of describing the storage. The physical schema describes the layout in the technical detail needed by the underlying software. An example is the CREATE TABLE statement in SQL. A logical schema, on the other hand, describes the data in a way more accessible to end users. The two are not necessarily the same, nor even similar, as shown in Figure 15.2.

MvflTililliM The existence of fields in a database does not mean that the data is actually present. It is important to understand every field used for data mining, and not to assume that a field populated correctly just because it exists. Skepticism is your ally.

An analogy might help to understand the utility of the physical and logical schemas. The logical schema describes things in a way that is familiar to business users. This would be analogous to saying that a house is ranch style, with four bedrooms, three baths, and a two-car garage. The physical schema goes



into more detail about how it is laid out. The foundation is reinforced concrete, 4 feet deep; the slab is 1,500 square feet; the walls are concrete block; and so on. The details of construction, although useful and complete, may not help a family find the right house.

Logical Model

COMPLAINT ACCT ID

COMPLAINT CODE REFUND AMOUNT

COMMENT ACCT ID

COMMENT CODE COMMENT TEXT

This logical model has four entities, three for customer-generated events and one for accounts.

The logical model is intended to be understood by business users.


PRODUCT CHANGE ACCT ID OLD PROD NEW PROD

K>-

This symbol means a product change has exactly one account

This symbol means an account might have 0 or more product

changes

Physical Model

TABLE: CONTACT ACCT ID CONTACT TYPE CONTACT DATE COMPLAINT CODE REFUND AMOUNT OLD PROD NEW PROD COMMENT TYPE

COMMENT TEXT

Information from all four entities in the logical model is found in the contact table.

The different types of contact are differentiated using the CONTACT TYPE field.

The physical model also specifies exact types, partitioning, indexes, storage characteristics, degrees of parallelism, constraints on values, and may other things not of interest to the business user.

Figure 15.2 The physical and logical schema may not be related to each other.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 [ 167 ] 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222