Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 [ 221 ] 222

sorting customers by, 8 z-scores, 551 search programs, link analysis, 331 searchable criteria, relevance

feedback, 268 sectional center facility (SCF), 553 selection step, generic algorithms, 429 self-organizing map (SOM), 249-251, 372

sensitivity analysis, neural networks, 247-248

sequential analysis, association rules, 318-319

sequential events, applying decision

trees to, 205 sequential patterns, identifying, 24 server platforms, affordability, 13 service business sectors, customer

relationships, 13-14 shared labels, fax machines, 341 short form, census data, 94 short-term trends, 75 sigmoid action functions, neural

networks, 225 signatures, customers assembling, 68 business versus residential

customers, 561 columns, pivoting, 563 computational issues, 594-596 considerations, 564 customer identification, 560-562 data for, cataloging, 559-560 discussed, 540-541 model set creation, 68 snapshots, 562 time frames, identifying, 562 similarity and distance, automatic

cluster detection, 359-363 similarity matrix, 368 similarity measurements, MBR, 271-272

Simplifying Assumptions Corporation (SAC), 97, 100

simulated annealing, 230 single linkage, automatic cluster

detection, 369 single response rates, 141 single views, customers, 517-518 sites. See Web sites skewed distributions, data

correction, 73 SKUs (stock-keeping units), 305 small-business relationships, customer

relationship management, 2 SMP (symmetric multiprocessor), 485 snapshots, customer signatures, 562 social information filtering, 282 soft clustering, automatic cluster

detection, 367 SOI (sphere of influence), 38 sole proprietors, 3

solicitation, marketing campaigns, 96 SOM (self-organizing map),

249-251, 372 source systems, 484, 486-487, 594 special-purpose code, 595 sphere of influence (SOI), 38 spiders, web crawlers, 331 splits, decision trees

on categorical input variables, 174

chi-square testing, 180-183

discussed, 170

diversity measures, 177-178

entropy, 179

finding, 172

Gini splitting criterion, 178 information gain ratio, 178, 180 intrinsic information of, 180 missing values, 174-175 multiway, 171

on numeric input variables, 173 population diversity, 178 purity measures, 177-178 reduction in variance, 183 surrogate, 175 spreadsheets, results, assessing, 85



SQL data, time series analysis, 572-573

stability-based pruning, decision trees,

191-192 staffing, data mining, 525-526 standard deviation estimation, 81 statistics, 132, 138 variance and, 138 standard error of proportion, statistical analysis, 139-141 standardization, numeric values, 551 standardized values, statistics, 129-133

star schema structure, relational

databases, 505 statistical analysis business data versus scientific

data, 159 censored data, 161 Central Limit Theorem, 129-130 chi-square tests case study, 155-158 degrees of freedom values,

chi-square tests, 152-153 difference of proportions versus,

153-154 discussed, 149

expected values, calculating, 150-151 continuous variables, 137-138 correlation ranges, 139 cross-tabulations, 136 density function, 133 as disciplinary technique, 123 discrete values, 127-131 experimentation, 160-161 field values, 128 histograms and, 127 marketing campaign approaches

acuity of testing, 147-148

confidence intervals, 146

proportion, standard error of, 139-141

sample sizes, 145

mean values, 137 median values, 137 mode values, 137 multiple comparisons, 148-149 normal distribution, 130-132 null hypothesis and, 125-126 probabilities, 133-135 p-values, 126 q-values, 126 range values, 137 regression ranges, 139 sample variation, 129 standard deviation, 132, 138 standardized values, 129-133 sum of values, 137-138 time series analysis, 128-129 truncated data, 162 variance, 138 z-values, 131, 138 statistical regression techniques,

generic algorithms, 423 status codes, as categorical value, 239 stemming, link analysis, 333 stock-keeping units (SKUs), 305 store comparisons, association rules

for, 315-316 stratification customer relationships and, 469 hazards, 410 strings, fixed-length characters,

552-554 subgroups automatic cluster detection

agglomerative clustering, 368-370

case study, 374-378

categorical variables, 359

centroid distance, 369

complete linkage, 369

data preparation, 363-365

dimension, 352

directed clustering, 372

discussed, 12, 91, 351

distance and similarity, 359-363

divisive clustering, 371-372

evaluation, 372-373



Gaussian mixture model, 366-367 geometric distance, 360-361 hard clustering, 367 Hertzsprung-Russell diagram,

352-354 luminosity, 351 scaling, 363-364 single linkage, 369 soft clustering, 367 SOM (self-organizing map), 372 vectors, angles between, 361-362 weighting, 363-365 zone boundaries, adjusting, 380 business goals, formulating, 605 customer attributes, 11 data transformation, 57 overview, 11 profiling tasks, 12 undirected data mining, 57 subscription-based relationships, customer relationships, 459-460 subtrees, decision trees, 189 sum of values, statistics, 137-138 summarization, data transformation, 44 summation function, 272 supermarket chains, as information

brokers, 15-16 supervised learning, 57 support, market based analysis, 301 surrogate splits, decision trees, 175 survey responses customer classification, 91 inconclusive, 46 profiling, 53

survey-based market research, 113 useful data sources, 61 survival analysis attrition, handling different types of, 412-113

customer relationships, 413-415 estimation tasks, 10 forecasting, 415-416 symmetric multiprocessor (SMP), 489-490

tables, lookup, auxiliary information,

570-571 tainted results, 72 tangent function, 223 target columns, 547 target fields, input variables, 37 target market versus control group

response, 38 targeted acquisition campaigns, 31 targeting good prospects, identifying, 88-89 prospecting, 88 taxonomy, products, 305 telecommunications customers,

market based analysis, 288 telephone switches, transaction

processing systems, 3 terabytes, 5

Teradata, relational database management software, 13 termination of services, 114 testing

acuity of, statistical analysis, 147-148 chi-square tests

case study, 155-158

CHIDIST function, 152

degrees of freedom values, 152-153

difference of proportions versus, 153-154

discussed, 149

expected values, calculating, 150-151

splits, decision trees, 180-183 F tests, 183-184 hypothesis testing

confidence levels, 148

considerations, 51

decision-making process, 50-51

generating, 51

market basket analysis, 51

null hypothesis, statistics and, 125-126



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 [ 221 ] 222