the money market
4. Follow the link from the money market 300-101 record to its list of transactions.
One interesting feature of network database systems is demonstrated by the set of product records on the far right of Figure 1-
2. Notice that each productrecord (Checking, Savings, etc.) points to a list of account records that are of that product type. Account records, therefore, can be accessed from multiple places (both customer records and productrecords), allowing a network database to act as a multiparent hierarchy. Both hierarchical and network database systems are alive and well today, although generally in the mainframe world. Additionally, hierarchical database systems have enjoyed a rebirth in the directory services realm, such as Microsoft’s Active Directory and Netscape’s Directory Server, as well as with Extensible Markup Language (XML). Beginning in the 1970s, however, a new way of representing data began to take root, one that was more rigorous yet easy to understand and implement.
1.1.2. The Relational Model
In 1970 Dr. E. F. Codd of IBM’s research laboratory published a paper entitled “A Relational Model of Data for Large Shared Data Banks” that suggested data be represented as sets of tables. Rather than using pointers to navigate between related entities, redundant data is used to link records in different tables. Figure 1-3 shows how George’s and Sue’s account information would appear in this context.
Figure 1-3. Relational
view of account data
There are four tables in Figure 1-3 representing the four entities discussed so far: customer, product, account, and TRansaction. Looking across the customer table in Figure 1-3, you can see three columns: cust_id (which contains the customer’s ID number), fname (which contains the customer’s first name), and lname (which contains the customer’s last name).
Looking down the customer table, you can see two rows, one containing George Blake’s data and the other containing Sue Smith’s data. The number of columns that a table may contain differs from server to server, but it is generally large enough not to be an issue (Microsoft SQL Server, for example, allows up to 1,024 columns per table). The number of rows that a table may contain is more a matter of physical limits (i.e., how much disk drive space is available) than of database server limitations.
Each table in a relational database includes information that uniquely identifies a row in that table (known as the primary key), along with additional information needed to describe the entity completely. Looking again at the customer table, the cust_id column holds a different number for each customer; George Blake, for example, can be uniquely identified by customer ID #1. No other customer will ever be assigned that identifier, and no other information is needed to locate George Blake’s data in the customer table. While I might have chosen to use the combination of the fname and lname columns as the primary key (a primary key consisting of two or more columns is known as a compound key), there could easily be two or more people with the same first and last names that have accounts at the bank. Therefore, I chose to include the cust_id column in the customer table specifically for use as a primary-key column.
Some of the tables also include information used to navigate to another table. For example, the account table includes a column called cust_id, which contains the unique identifier of the customer who opened the account, along with a column called product_cd, which contains the unique identifier of the product to which the account will conform. These columns are known as foreign keys, and they serve the same purpose as the lines that connect the entities in the hierarchical and network versions of the account information. However, unlike the rigid structure of the hierarchical/network models, relational tables can be used in various ways (including some not envisioned by the people who originally designed the database).
It might seem wasteful to store the same data many times, but the relational model is quite clear on what redundant data may be stored. For example, it is proper for the account table to include a column for the unique identifier of the customer who opened the account, but it is not proper to include the customer’s first and last names as well. If a customer were to change her name, for example, you want to make sure that there is only one place in the database that holds the customer’s name; otherwise, the data might be changed in one place but not another, causing the data in the database to be unreliable. The proper place for this data is the customer table, and only the cust_id data should be included in other tables. It is also not proper for a single column to contain multiple pieces of information, such as a name column that contains both a person’s first and last names, or an address column that contains street, city, state, and zip code information. The process of refining a database design to ensure that each independent piece of information is in only one place (except for foreign keys) is known as normalization.
Getting back to the four tables in Figure 1-3, you may wonder how you would use these tables to find George Blake’s transactions against his checking account. First, you would find George Blake’s unique identifier in the customer table. Then, you would find the row in the account table whose cust_id column contains George’s unique identifier and whose product_cd column matches the row in the product table whose name column equals “checking.” Finally, you would locate the rows in the TRansaction table whose account_id column matches the unique identifier from the account table. This might sound complicated, but it can be done in a single command using the SQL language as you will see shortly.
I’ve introduced some new terminology in the previous sections, so maybe it’s time for some formal definitions. Table 1-1 shows the terms we will use for the remainder of the book along with their definitions.
Table 1-1. Terms and definitions
Something of interest to the database user community. Examples include customers, parts, geographic
Column An individual piece of data stored in a table.
Row A set of columns that together completely describe an entity or some action on an entity. Also called a record.
Table A set of rows, held either in memory (nonpersistent) or on permanent storage (persistent).
Result Another name for a nonpersistent table,
generally the result of an SQL query.
One or more columns that can be used as a unique identifier for each row in a table.
One or more columns that can be used together to identify a single row in another table.
1.2. What Is SQL?
Along with Codd’s definition of the relational model, he proposed a language called DSL/Alpha for manipulating the data in relational tables. Shortly after Codd’s paper was released, IBM commissioned a group to build a prototype based on Codd’s ideas. This group created a simplified version of DSL/Alpha that they called SQUARE. Refinements to SQUARE led to a language called SEQUEL, which was, finally, renamed SQL.
SQL is now entering its fourth decade, and it has undergone a great deal of change along the way. In the mid 1980s, the American National Standards Institute (ANSI) began working on the first standard for the SQL language, which was published in 1986.
Subsequent refinements led to new releases of the SQL standard in 1989, 1992, 1999, and 2003. Along with refinements to the core language, new features have been added to the SQL language to incorporate object-oriented functionality, among other things.
SQL goes hand-in-hand with the relational model because the result of an SQL query is a table (also called, in this context, a result set). Thus, a new permanent table can be created in a relational database simply by storing the result set of a query. Similarly, a query can use both permanent tables and the result sets from other queries as inputs (this will be explored in detail in Chapter 9).
One final note: SQL is not an acronym for anything (although many people will insist it stands for “Structured Query Language”). When referring to the language, it is equally acceptable to say the letters individually (i.e.,
S. Q. L.) or to use the word “sequel.”
1.2.1. SQL Statement Classes
The SQL language is broken into several distinct parts: the parts that will be explored in this book include SQL schema statements, which are used to define the data structures stored in the database; SQL data statements, which are used to manipulate the data structures previously defined using SQL schema statements; and SQL transaction statements, which are used to begin, end, and rollback transactions (covered in Chapter 12). For example, to create a new table in your database, you would use the SQL schema statement create table, whereas the process of populating your new table with data would require the SQL data statement insert.
To give you a taste of what these statements look like, here’s an SQL schema statement that creates a table called corporation:
CREATE TABLE corpora
This statement creates a table with two columns, corp_id and name, with the corp_id column identified as the primary key for the table. The finer details of this statement, such as the different data types available with MySQL, will be probed in the following chapter. Next, here’s a SQL data statement that inserts a row into the corporationtable for Acme Paper Corporation:
INSERT INTO corporat VALUES (27, ‘Acme Pa
This statement adds a row to the corporationtable with a value of 27 for the corp_id column and a value of Acme Paper Corporation for the name column.
Finally, here’s a simple select statement to retrieve the data that was just created:
mysql< SELECT name -> FROM corporat -> WHERE corp_id
+——————-| name +——————-| Acme Paper Corpora+——————-
All database elements created via SQL schema statements are stored in a special set of tables called the data dictionary. This “data about the database” is known collectively as metadata. Just like tables that you create yourself, data dictionary tables can be queried via a select statement, thereby allowing you to discover the current data structures deployed in the database at runtime. For example, if you are asked to write a report showing the new accounts created last month, you could either hard-code the names of the columns in the account table that were known to you when you wrote the report, or you could query the data dictionary to determine the current set of columns and dynamically generate the report each time it is executed.
Most of this book will be concerned with the data portion of the SQL language, which consists of the select, update, insert, and delete commands. SQL schema statements will be demonstrated in Chapter 2, where the sample database used throughout this book will be generated. In general, SQL schema statements do not require much discussion apart from their syntax, whereas SQL data statements, while few in number, offer numerous opportunities for detailed study. Most chapters in this book will, therefore, concentrate on the SQL data statements.
1.2.2. SQL: A Nonprocedural
If you have worked with programming languages in the past, you are used to defining variables and data structures, using conditional logic (i.e., if-then-else) and looping constructs (i.e., do while … end), and breaking your code into small, reusable pieces (i.e., objects, functions, procedures). Your code is handed to a compiler, and the resulting executable does exactly (well, not always exactly) what you programmed it to do. Whether you work with Java, C#, C, Visual Basic, or some other procedural language, you are in complete control of what the program does. With SQL, however, you will need to give up some of the control you are used to, because SQL statements define the necessary inputs and outputs, but the manner in which a statement is executed is left up to a component of your database engine known as the optimizer. The optimizer’s job is to look at your SQL statements and, taking into account how your tables are configured and what indexes are available, decide the most efficient execution path (well, not always the most efficient). Most database engines will allow you to influence the optimizer’s decisions by specifying optimizer hints, such as suggesting that a particular index be used; most SQL users, however, will never get to this level of sophistication and will leave such tweaking to their database administrator or performance expert.
With SQL, therefore, you will not be able to write complete applications. Unless you are writing a simple script to manipulate certain data, you will need to integrate SQL with your favorite programming language. Some database vendors have done this for you, such as Oracle with their PL/SQL language or Microsoft with their TransactSQL language. With these languages, the SQL data statements are part of the language’s grammar, allowing you to seamlessly integrate database queries with procedural commands. If you are using a non-database-specific language such as Java, however, you will need to use a toolkit to execute SQL statements from your code. Some of these toolkits are provided by your database vendor, whereas others are created by third-party vendors or by open-source providers. Table 1-2 shows some of the available options for integrating SQL into a specific language.
Table 1-2. SQL integration toolkits
JDBC (Java Database
RogueWave SourcePro DB (third-party tool to connect to Oracle, SQL Server, MySQL, Informix, DB2, Sybase, and PostgreSQL databases)
Pro*C (Oracle) MySQL C API (open source) DB2 Call Level Interface (IBM)
If you only need to execute SQL commands interactively, every database vendor provides at least a simple tool for submitting SQL commands to the database engine and inspecting the results. Most vendors provide a graphical tool as well that includes one window showing your SQL commands and another window showing the results from your SQL commands. Since the examples in this book are executed against a MySQL database, I will be using the mysql command-line tool to run the examples and format the results.
1.2.3. SQL Examples
Earlier in this chapter, I promised to show you an SQL statement that would return all of the transactions against George Blake’s checking account. Without further ado, here it is:
SELECT t.txn_id, t.t
FROM customer c INNE INNER JOIN product INNER JOIN transac
WHERE c.fname = ‘Geo AND p.name = ‘chec
Without going into too much detail at this point, this query identifies the row in the account table for George Blake and the row in the product table for the “checking” product, finds the row in the account table for this customer/product combination, and returns four columns from the transaction table for all transactions posted to this account. I will cover all of the concepts in this query (plus a lot more) in the following chapters, but I wanted to at least show what the query would look like.
The previous query contains three different clauses: select, from, and where. Almost every query that you encounter will include at least these three clauses, although there are several more that can be used for more specialized purposes. The role of each of these three clauses is demonstrated by the following:
SELECT /* one or mor
FROM /* one or more
WHERE /* one or more
When constructing your query, your first task is generally to determine which table or tables will be needed and then add them to your from clause. Next, you will need to filter out the data from these tables that doesn’t help answer your query and add these conditions to your where clause. Finally, you will decide which columns from the different tables need to be retrieved and add them to your select clause. Here’s a simple example that shows how you would find all customers with the last name “Smith”:
SELECT cust_id, fnam
FROM customer WHERE lname = ‘Smith
This query searches the customer table for all rows whose lname column matches the string “Smith” and returns the cust_id and fname columns from those rows.
Along with querying your database, you will most likely be involved with populating and modifying the data in your database.
Here’s a simple example of how you would insert a new row into the product table:
INSERT INTO product VALUES (‘CD’, ‘Certi
Whoops, look like you misspelled “Deposit.” No problem. You can clean that up with an update statement:
UPDATE product SET name = ‘Certific
WHERE product_cd = ‘
Notice that the updatestatement also contains a where clause, just like the select statement. This is because an updatestatement must isolate the rows to be modified; in this case, you are specifying that only those rows whose product_cd column matches the string “CD” should be modified. Since the product_cd column is the primary key for the producttable, you should expect your update statement to modify exactly one row (or zero, if the value doesn’t exist in the table). Whenever you execute an SQL data statement, you will receive feedback from the database engine as to how many rows were affected by your statement. If you are using an interactive tool such as the mysql command-line tool mentioned earlier, then you will receive feedback
concerning how many rows were either:
Returned by your select statement
Created by your insert statement
Modified by your update statement
Removed by your delete statement
If you are using a procedural language with one of the toolkits mentioned earlier, the toolkit will include a call to ask for this information after your SQL data statement has executed. In general, it’s a good idea to check this info to make sure your statement didn’t do something unexpected (like when you forget to put a where clause on your delete statement and delete every row in the table!).
1.3. What Is MySQL?
Relational databases have been available 300-101 route pdf commercially for over two decades. Some of the most mature and popular products include:
Oracle Database from Oracle Corporation
SQL Server from Microsoft