Промышленный лизинг
Методички
data (continued) missing data data correction, 73-74 NULL values, 590 splits, decision trees, 174-175 operational feedback, 485, 492 patterns meaningful discoveries, 56 prediction, 45 untruthful learning sources, 45-46 point-of-sale association rules, 288 scanners, 3 as useful data source, 60 preparation automatic cluster detection, 363-365 categorical values, neural networks, 239-240 continuous values, neural networks, 235-237 quality, association rules, 308 representation, generic algorithms, 432-433 scarce, 62 source systems, 484, 486-487 SQL, time series analysis, 572-573 terabytes, 5 truncated, 162 useful data sources, 60-61 visualization tools, 65 wrong level of detail, untruthful learning sources, 47 data mining architecture, 528-532 as creative process, 33 directed classification, 57 discussed, 7 estimation, 57 prediction, 57 documentation, 536-537 goals of, 7 insourcing, 524-525 outsourcing, 522-524 platforms, 527 scalability, 533-534 scoring platforms, 527-528 staffing, 525-526 typical operational systems versus, 33 undirected affinity grouping, 57 clustering, 57 discussed, 7 Data Preparation for Data Mining (Dorian Pyle), 75 The Data Warehouse Toolkit (Ralph Kimball), 474 data warehousing customer patterns, 5 for decision support, 13 discussed, 4 database administrators (DBAs), 488 databases call detail, 37 demographic, 37 KDD (knowledge discovery in databases), 8 server platforms, affordability, 13 datasets, balanced, model sets, 68 dates and times, interval variables, 551 DBAs (database administrators), 488 deaths, house-hold level data, 96 debt, nonrepayment of, credit risks, 114 decision support data warehousing for, 13 hypothesis testing, 50-51 summary data, OLAP, 477-478 decision trees alphas, 188 alternate representations for, 199-202 applying to sequential events, 205 branching nodes, 176 building models, 8 case-study, 206, 208 for catalog response models, 175 classification, 9, 166-168 cost considerations, 195 effectiveness of, measuring, 176 estimation, 170 as exploration tool, 203-204 fields, multiple, 195-197 neural networks, 199 profiling tasks, 12 projective visualization, 207-208 pruning C5 algorithm, 190-191 CART algorithm, 185, 188-189 discussed, 184 minimum support pruning, 312 stability-based, 191-192 rectangular regions, 197 regression trees, 170 rules, extracting, 193-194 SAS Enterprise Miner Tree Viewer tool, 167-168 scoring, 169-170 splits on categorical input variables, 174 chi-square testing, 180-183 discussed, 170 diversity measures, 177-178 entropy, 179 finding, 172 Gini splitting criterion, 178 information gain ratio, 178, 180 intrinsic information of, 180 missing values, 174-175 multiway, 171 on numeric input variables, 173 population diversity, 178 purity measures, 177-178 reduction in variance, 183 surrogate, 175 subtrees, selecting, 189 uses for, 166 declining usage, behavior-based variables, 577-579 deep intimacy, customer relationships, 449, 451 default classes, records, 194 default risks, proof-of-concept projects, 599 degrees of freedom values, chi-square tests, 152-153 democracy approach, memory-based reasoning, 279-281 demographic databases, 37 demographic profiles, customers, 31 density data selection, 62-63 density function, statistics, 133 deploying models, 84-85 derived variables, column data, 542 descriptions comparing values with, 65 data transformation, 57 descriptive models, assessing, 78 descriptive profiling, 52 deviation. See standard deviation difference of proportion chi-square tests versus, 153-154 statistical analysis, 143-144 differential response analysis, marketing campaigns, 107-108 differentiation, market based analysis, 289 dimension automatic cluster detection, 352 dimension tables, OLAP, 502-503 directed clustering, automatic cluster detection, 372 directed data mining classification, 57 discussed, 7 estimation, 57 prediction, 57 directed graphs, 330 directed models, assessing, 78-79 directed profiling, 52 dirty data, 592-593 discrete outcomes, classification, 9 discrete values, statistics, 127-131 discrimination measures, ROC curves, 99 dissociation rules, 317 distance and similarity, automatic cluster detection, 359-363 distance function defined, 271-272 discussed, 258, 265 hidden distance fields, 278 identity distance, 271 numeric fields, 275 triangle inequality, 272 zip codes, 276-277 distribution data exploration, 65 one-tailed, 134 probability and, 135 statistics, 130-132 two-tailed, 134 diverse data types, 536 diversity measures, splitting criteria, decision trees, 177-178 divisive clustering, automatic cluster detection, 371-372 documentation data mining, 536-537 historical data as, 61 dumping data, flat files, 594 EBCF (existing base churn forecast), 469 economic data, useful data sources, 61 edges, graphs, 322 education level, house-hold level data, 96 e-mail as communication channel, 89 free text resources, 556-557 encoding, inconsistent, data correction, 74 enterprise-wide data, 33 entropy, information gain, 178-180 equal-height binning, 551 equal-width binning, 551 erroneous conclusions, 74 errors countervailing, 81-82 error rates adjusted, 185 establishing, 79 measurement, 159 operational, 159 predicting, 191 standard error of proportion, statistical analysis, 139-141 established customers, customer relationships, 457 estimation accuracy, 79-81 averages, 81 business goals, formulating, 605 classification tasks, 9 collaboration filtering, 284-285 data transformation, 57 decision trees, 170 directed data mining, 57 estimation task examples, 10 examples of, 10 neural networks, 10, 215 regression models, 10 revenue, behavior-based variables, 581-583 standard deviation, 81 valued outcomes, 9 ETL (extraction, transformation, and load) tools, 487, 595 evaluation, automatic cluster detection, 372-373 event-based relationships, customer relationships, 458-459 existing base churn forecast (EBCF), 469 expectations comparing to results, 31 expected values, chi-square tests, 150-151 proof-of-concept projects, 599 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 [ 216 ] 217 218 219 220 221 222 |