Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 [ 168 ] 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

WHAT IS A RELATIONAL DATABASE?

One of the most common ways to store data is in a relational database management system (RDBMS). The basis of relational databases starts with research by E. F. Codd in the early 1970s on the properties of a special type of set composed of tuples-what we would call rows in tables. From this, he derived a relational algebra consisting of operations that form a relational algebra, which are depicted in the following figure:

I row I

I 001 I

I 002 I

I 003 I 001

I 005 I

I 006 I

I 007 I

I 008 I

I 009 I

I 012 I

Before

col A I col B I col C I col U I col E colF

Filter

Filtering removes rows based on the values in one or more columns. Each output row either is or is not in the input table.

After

row col A col B col C col U col E colF

col B col C col U col E colF

Select

Selecting chooses the columns for the output. Each column in the output is in the input or is a function of some of the input columns.

I row I col A ooZ colF new

006

007

008

009 ИИГ

010 I I I

011 I I I

012 I I I

I 002 I

I 003 I

I 004 I

I 005 I

I 006 I

I 007 I

I 008 I

I 009 I 010

key2 key2 key2 key2 key3 key3 key3 key4

Aggregation (or Group by)

Aggregation groups columns together based on a common key. All the rows with the same key are summarized into a single output row.

key1

key3 key4

avg B max B sum U sum E sum F

key2

key2

006 key2

007 key3

008 key3

009 key3

010 key4


Join

Join matches rows in two tables. For every pair of rows whose keys match in the inputs, a new row is created in the output.

key1 key1 key3 key3 key3 key4 key4 key4 key4 key4 key4

col B col C col G

col A

col B

col C

key1

key2

col A

col G

key1

key3

Relational databases have four major querying operations.

9836



These operations are in addition to set operations, such as union and intersection. In nonscientific terminology, these relational operations are:

Filter a given set of rows based on the values in the rows.

Select a given set of columns and perform basic operations on them.

Group rows together and aggregate values in the columns.

Join two tables together based on the values in the columns. Interestingly, the relational operations do not include sorting (except for output purposes). These operations specify what can be done with tuples, not how it gets done. In fact, relational databases often use sorting for grouping and joining operations; however, there are non-sort-based algorithms for these operations as well.

SQL, developed by IBM in the 1980s, has become the standard language for accessing relational databases and implements these basic operations. Because SQL supports subqueries (that is, using the results of one query as a table in another query), it is possible to express some very complex data manipulations.

A common way of representing the database structure is to use an entity-relationship (E-R) diagram. The following figure is a simple E-R diagram with five entities and four relationships among them. In this case, each entity corresponds to a separate table with columns corresponding to the attributes of the entity. In addition, columns represent the relationships between tables in the database; such columns are called keys (either foreign or primary keys). Explicitly storing keys in the database tables using a consistent naming convention facilitates finding ones way around the database.

One nice feature of relational databases is the ability to design a database so that any given data item appears in exactly one place-with no duplication. Such a database is called a normalized database. Knowing exactly where each data item is located is highly efficient in theory, since updating any field requires modifying only one row in one table. When a normalized database is well-designed and implemented, there is no redundant data, out-of-date data, or invalid data.

An important idea behind normalization is creating reference tables. Each reference table logically corresponds to an entity, and each has a key used for looking up information about the entity. In a normalized database, the join operation is used to lookup values in reference tables.

Relational databases are a powerful way of storing and accessing data. However, much of their design is focused on updating the data and handling large numbers of transactions. Data mining is interested in combining data together to spot higher level patterns.. Typically, data mining uses many queries, each of which requires several joins, several aggregations, and subqueries-a veritable army of killer queries.



WHAT IS A RELATIONAL DATABASE? (continued)

One account has multiple transactions, but each transaction is associated with exactly one account.

A single transaction occurs at exactly one vendor. But, each vendor may have multiple transactions.

TRANSACTION TABLE

Transaction ID

Authorization Code

Customer ID

ACCOUNT TABLE

Account ID

Account Type

Minimum Payment

Amount Due Last Payment Amt

VENDOR TABLE

Vendor Name

Vendor Type

A customer may have one or more accounts. But each account belongs to exactly one customer. Likewise, one or more customers may be in a household.

CUSTOMER TABLE

Customer Name

Account ID

Vendor ID

Vendor ID

Date

Time

Amount

nterest Rate

redit Limit

HOUSEHOLD TABLE Household ID Number of Children ZIP Code

An E-R diagram can be used to show the tables and fields in a relational database. Each box shows a single table and its columns. The lines between them show relationships, such as 1-many, 1-1, and many-to-many. Because each table corresponds to an entity, this is called a physical design.

Sometimes, the physical design of a database is very complicated. For instance, the TRANSACTION TABLE might actually be split into a separate table for each month of transactions. In this case, the above E-R diagram is still useful; it represents the logical structure of the data, as business users would understand it.

An entity relationship diagram describes the layout of data for a simple credit card database.

With respect to data mining, relational databases (and SQL) have some limitations. First, they provide little support for time series. This makes it hard to figure out from transaction data such things as the second product purchased, the last three promos a customer responded to, or the ordering of events; these can require very complicated SQL. Another problem is that two operations often eliminate fields inadvertently. When a field contains a missing value (NULL) then it automatically fails any comparison, even not equals .

Team-Fly®



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 [ 168 ] 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222