Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 [ 5 ] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

Different Kinds of Churn Model 119

Predicting Who Will Leave 119

Predicting How Long Customers Will Stay 119

Lessons Learned 120

Chapter 5 The Lure of Statistics: Data Mining Using Familiar Tools 123

Occams Razor 124

The Null Hypothesis 125

P-Values 126

A Look at Data 126

Looking at Discrete Values 127

Histograms 127

Time Series 128

Standardized Values 129

From Standardized Values to Probabilities 133

Cross-Tabulations 136

Looking at Continuous Variables 136

Statistical Measures for Continuous Variables 137

Variance and Standard Deviation 138

A Couple More Statistical Ideas 139

Measuring Response 139

Standard Error of a Proportion 139

Comparing Results Using Confidence Bounds 141

Comparing Results Using Difference of Proportions 143

Size of Sample 145

What the Confidence Interval Really Means 146

Size of Test and Control for an Experiment 147

Multiple Comparisons 148

The Confidence Level with Multiple Comparisons 148

Bonferronis Correction 149

Chi-Square Test 149

Expected Values 150

Chi-Square Value 151

Comparison of Chi-Square to Difference of Proportions 153

An Example: Chi-Square for Regions and Starts 155

Data Mining and Statistics 158

No Measurement Error in Basic Data 159

There Is a Lot of Data 160

Time Dependency Pops Up Everywhere 160

Experimentation is Hard 160

Data Is Censored and Truncated 161

Lessons Learned 162

Chapter 6 Decision Trees 165

What Is a Decision Tree? 166

Classification 166

Scoring 169

Estimation 170

Trees Grow in Many Forms 170



How a Decision Tree Is Grown 171

Finding the Splits 172

Splitting on a Numeric Input Variable 173

Splitting on a Categorical Input Variable 174

Splitting in the Presence of Missing Values 174

Growing the Full Tree 175

Measuring the Effectiveness Decision Tree 176

Tests for Choosing the Best Split 176

Purity and Diversity 177

Gini or Population Diversity 178

Entropy Reduction or Information Gain 179

Information Gain Ratio 180

Chi-Square Test 180

Reduction in Variance 183

F Test / /> 183

Pruning \/ 184

The CART Pruning Algorithm 185

Creating the Candidate Subtrees 185

Picking the Best Subtree 189

Using the Test Set to Evaluate the Final Tree 189

The C5 Pruning Algorithm 190

Pessimistic Pruning 191

Stability-Based Pruning 191

Extracting Rules from Trees 193

Taking Cost into Account 195

Further Refinements to the Decision Tree Method 195

Using More Than One Field at a Time 195

Tilting the Hyperplane 197

Neural Trees 199

Piecewise Regression Using Trees 199

Alternate Representations for Decision Trees 199

Box Diagrams 199

Tree Ring Diagrams 201

Decision Trees in Practice 203

Decision Trees as a Data Exploration Tool 203

Applying Decision-Tree Methods to Sequential Events 205

Simulating the Future 206

Case Study: Process Control in a Coffee-Roasting Plant 206

Lessons Learned 209

Chapter 7 Artificial Neural Networks 211

A Bit of History 212

Real Estate Appraisal 213

Neural Networks for Directed Data Mining 219

What Is a Neural Net? 220

What Is the Unit of a Neural Network? 222

Feed-Forward Neural Networks 226

Team-Fly®



How Does a Neural Network Learn Using

Back Propagation? 228 Heuristics for Using Feed-Forward,

Back Propagation Networks 231

Choosing the Training Set 232

Coverage of Values for All Features 232

Number of Features 233

Size of Training Set 234

Number of Outputs 234

Preparing the Data 235

Features with Continuous Values 235

Features with Ordered, Discrete (Integer) Values 238

Features with Categorical Values 239

Other Types of Features 241

Interpreting the Results 241

Neural Networks for Time Series 244

How to Know What Is Going on Inside a Neural Network 247

Self-Organizing Maps 249

What Is a Self-Organizing Map? 249

Example: Finding Clusters 252

Lessons Learned 254

Chapter 8 Nearest Neighbor Approaches: Memory-Based

Reasoning and Collaborative Filtering 257

Memory Based Reasoning 258 Example: Using MBR to Estimate Rents in Tuxedo, New York 259

Challenges of MBR 262

Choosing a Balanced Set of Historical Records 262

Representing the Training Data 263 Determining the Distance Function, Combination

Function, and Number of Neighbors 265

Case Study: Classifying News Stories 265

What Are the Codes? 266

Applying MBR 267

Choosing the Training Set 267

Choosing the Distance Function 267

Choosing the Combination Function 267

Choosing the Number of Neighbors 270

The Results 270

Measuring Distance 271

What Is a Distance Function? 271

Building a Distance Function One Field at a Time 274

Distance Functions for Other Data Types 277

When a Distance Metric Already Exists 278 The Combination Function: Asking the Neighbors

for the Answer 279

The Basic Approach: Democracy 279

Weighted Voting 281



1 2 3 4 [ 5 ] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222