Промышленный лизинг
Методички
Different Kinds of Churn Model 119 Predicting Who Will Leave 119 Predicting How Long Customers Will Stay 119 Lessons Learned 120 Chapter 5 The Lure of Statistics: Data Mining Using Familiar Tools 123 Occams Razor 124 The Null Hypothesis 125 P-Values 126 A Look at Data 126 Looking at Discrete Values 127 Histograms 127 Time Series 128 Standardized Values 129 From Standardized Values to Probabilities 133 Cross-Tabulations 136 Looking at Continuous Variables 136 Statistical Measures for Continuous Variables 137 Variance and Standard Deviation 138 A Couple More Statistical Ideas 139 Measuring Response 139 Standard Error of a Proportion 139 Comparing Results Using Confidence Bounds 141 Comparing Results Using Difference of Proportions 143 Size of Sample 145 What the Confidence Interval Really Means 146 Size of Test and Control for an Experiment 147 Multiple Comparisons 148 The Confidence Level with Multiple Comparisons 148 Bonferronis Correction 149 Chi-Square Test 149 Expected Values 150 Chi-Square Value 151 Comparison of Chi-Square to Difference of Proportions 153 An Example: Chi-Square for Regions and Starts 155 Data Mining and Statistics 158 No Measurement Error in Basic Data 159 There Is a Lot of Data 160 Time Dependency Pops Up Everywhere 160 Experimentation is Hard 160 Data Is Censored and Truncated 161 Lessons Learned 162 Chapter 6 Decision Trees 165 What Is a Decision Tree? 166 Classification 166 Scoring 169 Estimation 170 Trees Grow in Many Forms 170 How a Decision Tree Is Grown 171 Finding the Splits 172 Splitting on a Numeric Input Variable 173 Splitting on a Categorical Input Variable 174 Splitting in the Presence of Missing Values 174 Growing the Full Tree 175 Measuring the Effectiveness Decision Tree 176 Tests for Choosing the Best Split 176 Purity and Diversity 177 Gini or Population Diversity 178 Entropy Reduction or Information Gain 179 Information Gain Ratio 180 Chi-Square Test 180 Reduction in Variance 183 F Test / /> 183 Pruning \/ 184 The CART Pruning Algorithm 185 Creating the Candidate Subtrees 185 Picking the Best Subtree 189 Using the Test Set to Evaluate the Final Tree 189 The C5 Pruning Algorithm 190 Pessimistic Pruning 191 Stability-Based Pruning 191 Extracting Rules from Trees 193 Taking Cost into Account 195 Further Refinements to the Decision Tree Method 195 Using More Than One Field at a Time 195 Tilting the Hyperplane 197 Neural Trees 199 Piecewise Regression Using Trees 199 Alternate Representations for Decision Trees 199 Box Diagrams 199 Tree Ring Diagrams 201 Decision Trees in Practice 203 Decision Trees as a Data Exploration Tool 203 Applying Decision-Tree Methods to Sequential Events 205 Simulating the Future 206 Case Study: Process Control in a Coffee-Roasting Plant 206 Lessons Learned 209 Chapter 7 Artificial Neural Networks 211 A Bit of History 212 Real Estate Appraisal 213 Neural Networks for Directed Data Mining 219 What Is a Neural Net? 220 What Is the Unit of a Neural Network? 222 Feed-Forward Neural Networks 226 Team-Fly® How Does a Neural Network Learn Using Back Propagation? 228 Heuristics for Using Feed-Forward, Back Propagation Networks 231 Choosing the Training Set 232 Coverage of Values for All Features 232 Number of Features 233 Size of Training Set 234 Number of Outputs 234 Preparing the Data 235 Features with Continuous Values 235 Features with Ordered, Discrete (Integer) Values 238 Features with Categorical Values 239 Other Types of Features 241 Interpreting the Results 241 Neural Networks for Time Series 244 How to Know What Is Going on Inside a Neural Network 247 Self-Organizing Maps 249 What Is a Self-Organizing Map? 249 Example: Finding Clusters 252 Lessons Learned 254 Chapter 8 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 257 Memory Based Reasoning 258 Example: Using MBR to Estimate Rents in Tuxedo, New York 259 Challenges of MBR 262 Choosing a Balanced Set of Historical Records 262 Representing the Training Data 263 Determining the Distance Function, Combination Function, and Number of Neighbors 265 Case Study: Classifying News Stories 265 What Are the Codes? 266 Applying MBR 267 Choosing the Training Set 267 Choosing the Distance Function 267 Choosing the Combination Function 267 Choosing the Number of Neighbors 270 The Results 270 Measuring Distance 271 What Is a Distance Function? 271 Building a Distance Function One Field at a Time 274 Distance Functions for Other Data Types 277 When a Distance Metric Already Exists 278 The Combination Function: Asking the Neighbors for the Answer 279 The Basic Approach: Democracy 279 Weighted Voting 281 1 2 3 4 [ 5 ] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |