Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 [ 175 ] 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

Facts

Facts are the measures in each subcube. The most useful facts are additive, so they can be combined together across many different subcubes to provide responses to queries at arbitrary levels of summarization. Additive facts make it possible to summarize data along any dimension or along several dimensions at one time-which is exactly the purpose of the cube. Examples of additive facts are:

Counts

Counts of variables with a particular value

Total duration of time (such as spent on a web site)

Total monetary values

The total amount of money spent on a particular product on a particular day is the sum of the amount spent on that product in each store. This is a good example of an additive fact. However, not all facts are additive. Examples include:

Averages

Unique counts

Counts of things shared across different cubes, such as transactions

Averages are not a very interesting example of a nonadditive fact, because an average is a total divided by a count. Since each of these is additive, the average can be derived after combining these facts.

The other examples are more interesting. One interesting question is how many unique customers did some particular action. Although this number can be stored in a subcube, it is not additive. Consider a retail cube with the date, store, and product dimensions. A single customer may purchase items in more than one store, or purchase more than one item in a store, or make purchases on different days. A field containing the number of unique customers has information about one customer in more than one subcube, violating the cardinal rule of OLAP, so the cube is not going to be able to report on unique customers.

A similar thing happens when trying to count numbers of transactions. Since the information about the transaction may be stored in several different subcubes (since a single transaction may involve more than one product), counts of transactions also violate the cardinal rule. This type of information cannot be gathered at the summary level.

Another note about facts is that not all numeric data is appropriate as a fact in a cube. For instance, age in years is numeric, but it might be better treated as a dimension rather than a fact. Another example is customer value. Discrete



ranges of customer value are useful as dimensions, and in many circumstances more useful than trying to include customer value as a fact.

When designing cubes, there is a temptation to mix facts and dimensions by creating a count or total for a group of related values. For instance:

Count of active customers of less than 1-year tenure, between 1 and 2 years, and greater than 2 years

Amount credited on weekdays; amount credited on weekends

Total for each day of the week

Each of these suggests another dimension for the cube. The first should have a customer tenure dimensions that takes at least three values. The second appeared in a cube where the time dimension was by month. These facts suggest a need for daily summaries, or at least for separating weekdays and weekends along a dimension. The third suggests a need for a date dimension at the granularity of days.

Dimensions and Their Hierarchies

Sometimes, a single column seems appropriate for multiple dimensions. For instance, OLAP is a good tool for visualizing trends over time, such as for sales or financial data. A specific date in this case potentially represents information along several dimensions, as shown in Figure 15.7:

Day of the week

Month

Quarter

Calendar year

One approach is to represent each of these as a different dimension. In other words, there would be four dimensions, one for the day of the week, one for the month, one for the quarter, and one for the calendar year. The data for January 2004, then would be the subcube where the January dimension intersects the 2004 dimension.

This is not a good approach. Multidimensional modeling recognizes that time is an important dimension, and that time can have many different attributes. In addition to the attributes described above, there is also the week of the year, whether the date is a holiday, whether the date is a work day, and so on. Such attributes are stored in reference tables, called dimension tables. Dimension tables make it possible to change the attributes of the dimension without changing the underlying data.


Team-Fly®



Date (7 March 1997)

Month (Mar)

Day of the Week (Friday)

Day of the Month (7)

Day of the

Year (67)

Year (1997)

Figure 15.7 There are multiple hierarchies for dates.

WvflTlilliM Do not take shortcuts when designing the dimensions for an OLAP system. These are the skeleton of the data mart, and a weak skeleton will not last very long.

Dimension tables contain many different attributes describing each value of the dimension. For instance, a detailed geography dimension might be built from zip codes and include dozens of summary variables about the zip codes. These attributes can be used for filtering ( How many customers are in high-income areas? ). These values are stored in the dimension table rather than the fact table, because they cannot be aggregated correctly. If there are three stores in a zip code, a zip code population fact would get added up three times- multiplying the population by three.

Usually, dimension tables are kept up to date with the most recent values for the dimension. So, a store dimension might include the current set of stores with information about the stores, such as layout, square footage, address, and manager name. However, all of these may change over time. Such dimensions are called slowly changing dimensions, and are of particular interest to data mining because data mining wants to reconstruct accurate histories. Slowly changing dimensions are outside the scope of this book. Interested readers should review Ralph Kimballs books.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 [ 175 ] 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222