Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 [ 121 ] 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

is good for guidance, but it is an oversimplification. There are actually several types of expected fax machine usage for residential customers:

Dedicated fax. Some fax machines are on dedicated lines, and the line is used only for fax communication.

Shared. Some fax machines share their line with voice calls.

Data. Some fax machines are on lines dedicated to data use, either via fax or via computer modem.

Characterizing expected behavior is a good way to start any directed data mining problem. The better the problem is understood, the better the results are likely to be.

The presumption that fax machines call other fax machines is generally true for machines on dedicated lines, although wrong numbers provide exceptions even to this rule. To distinguish shared lines from dedicated or data lines, we assumed that any number that calls information-411 or 555-1212 (directory assistance services)-is used for voice communications, and is therefore a voice line or a shared fax line. For instance, call #4 in the example data contains a call to 555-1212, signifying that the calling number is likely to be a shared line or just a voice line. When a shared line calls another number, there is no way to know if the call is voice or data. We cannot identify fax machines based on calls to and from such a node in the call graph. On the other hand, these shared lines do represent a marketing opportunity to sell additional lines.

The process used to find fax machines consisted of the following steps:

1. Start with a set of known fax machines (gathered from the Yellow Pages).

2. Determine all the numbers that make or receive calls to or from any number in this set where the calls duration was longer than 10 seconds. These numbers are candidates.

If the candidate number has called 411, 555-1212, or a number identified as a shared fax number, then it is included in the set of shared voice/fax numbers.

Otherwise, it is included in the set of known fax machines.

3. Repeat Steps 1 and 2 until no more numbers are identified.

One of the challenges was identifying wrong numbers. In particular, incoming calls to a fax machine may sometimes represent a wrong number and give no information about the originating number (actually, if it is a wrong number then it is probably a voice line). We made the assumption that such incoming wrong numbers would last a very short time, as is the case with Call #3. In a larger-scale analysis of fax machines, it would be useful to eliminate other anomalies, such as outgoing wrong numbers and modem/fax usage.



The process starts with an initial set of fax numbers. Since this was a demonstration project, several fax numbers were gathered manually from the Yellow Pages based on the annotation fax by the number. For a larger-scale project, all fax numbers could be retrieved from the database used to generate the Yellow Pages. These numbers are only the beginning, the seeds, of the list of fax machine telephone numbers. Although it is common for businesses to advertise their fax numbers, this is not so common for fax machines at home.

Some Results

The sample of telephone records consisted of 3,011,819 telephone calls made over one month by 19,674 households. In the world of telephony, this is a very small sample of data, but it was sufficient to demonstrate the power of link analysis. The analysis was performed using special-purpose C++ code that stored the call detail and allowed us to expand a list of fax machines efficiently.

Finding the fax machines is an example of a graph-coloring algorithm. This type of algorithm walks through the graph and label nodes with different colors. In this case, the colors are fax, shared, voice, and unknown instead of red, green, yellow, and blue. Initially, all the nodes are unknown except for the few labeled fax from the starting set. As the algorithm proceeds, more and more nodes with the unknown label are given more informative labels.

Figure 10.9 shows a call graph with 15 numbers and 19 calls. The weights on the edges are the duration of each call in seconds. Nothing is really known about the specific numbers.

Information (411)


Figure 10.9 A call graph for 15 numbers and 19 calls.



Figure 10.10 shows how the algorithm proceeds. First, the numbers that are known to be fax machines are labeled F, and the numbers for directory assistance are labeled I. Any edge for a call that lasted less than 10 seconds has been dropped. The algorithm colors the graph by assigning labels to each node using an iterative procedure:

Any voice node connected to a fax node is labeled shared.

Any unknown node connected mostly to fax nodes is labeled fax.

This procedure continues until all nodes connected to fax nodes have a fax or shared label.

This is the initial call graph with short calls removed and with nodes labeled as tax, unknown, and information.

Nodes connected to the initial fax machines are assigned the fax label.

Those connected to information are assigned the voice label.

Those connected to both, are shared.

The rest are unknown.

Figure 10.10 Applying the graph-coloring algorithm to the call graph shows which numbers are fax numbers and which are shared.





1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 [ 121 ] 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222