Week 6
Updated on 28 Dec 2018
Table Design II
This unit will concentrate mainly on appropriate field types, primary keys and indexes.
Student_id | Student_Name | Unit | Final_Grade | Age | Grade_Description |
---|---|---|---|---|---|
S0001234 | James Jones | COMP306 | HD | 12 | High Distinction |
S0001235 | Sarah Smith | COMP306 | PA | 10 | Pass |
S0002345 | Bobby Brown | COMP306 | PA | 10 | Pass |
S0002234 | Tracy Taylor | COMP306 | CR | 11 | Credit |
S0001244 | Luke Lang | COMP306 | WW | 12 | Fail |
The data shown above might have been extracted from a MySQL database. If that were the case, then the table structure might look something like this:
Field | Type | Attributes | NULL |
---|---|---|---|
Student_ID | varchar(8) | No | |
Student_Name | varchar(30) | No | |
Unit | varchar(7) | No | |
Final_Grade | varchar(2) | ||
Age | tinyint(2) | unsigned | |
Grade_Description | varchar(30) |
Defining the column structures is only part of designing a proper database table. We still need to look at defining a primary key and indexes if appropriate.
KEYS
Keys are pieces of data that help to identify a row of information in a table (a row is also called a record). There are two types of keys, primary and foreign.
Questions:
- What is a primary key? What is a foreign key?
Primary Keys & Indexes
Primary keys are very important for database tables because they are used to help identify a row of information in a table. A primary key is a unique identifier that follows these rules:
- Always have a value
- Have a value that remains the same (never changes)
- Have a unique value for each record in the table.
An example of a primary key used by the government would be your Tax File Number, ABN etc. Invoice numbers from the University, Energy and communication companies could also be used as primary keys.
On the database table, the primary key is automatically an index. An index on a database table is very similar to an index of a book. In a book index, special keywords are indexed to specific page numbers. This allows you to find the topic you’re looking for very quickly. Without an index you’d have to search every page of the book until you found the topic you were searching for…
Question:
- Do you understand how a book index works?
Indexes
A database table is very similar to a book index. Without an index, the database would need to search every record until it found the records that you wanted. On a table that held very few records this won’t pose a problem, however as your table grows (records are added) performance issues will arise.
The index of a book contains specific keywords. Common language use words are not in the index. Words like THE, THAT, THERE, IS, A etc are not found in the index because they are used so prolifically. In fact not only are you likely to find these words on each page, chances are those words would be used in the construction of each sentence on every page of the book.
This is also applicable to database tables. Remember that the primary key must be unique, and it is also good practice if the fields where you’ve defined indexes are near to unique also. (Unlike the primary key, you can define an index on a column that won’t be unique).
A field that stored a person’s gender would be inappropriate to be indexed because like the book example, 1 in every 2 records are likely to match your search. It would be like indexing THE in a book!
If we know that a field is appropriate to index, how do we know if it needs to be indexed? The best approach is to look at the likely SQL statements that are going to be executed. Fields that appear in the ORDER BY or WHERE clauses are likely candidates for indexing.
In these examples, likely candidates for indexing would be the columns Student_Name
and Age
, although I’m not sure Age
would be appropriate to index anyway.
SELECT Student_id, Student_Name
FROM Student_table
ORDER BY Student_Name
SELECT Student_Name, Final_Grade
FROM Student_table
WHERE Age = 10
Normalization
Normalization is a set of rules that define the organization of data within the database. In database design text books you’ll come across terms such as 1st Normal Form, 2nd Normal Form and 3rd Normal Form. Normal Forms are the process that is followed for normalizing a database.
Questions:
- Have you heard of the term Normalization?
- What does a Normalized database look like?
1st Normal Form
Student_id | Student_Name | Unit | Final_Grade | Age | Grade_Description |
---|---|---|---|---|---|
S0001234 | James Jones | COMP306 | HD | 12 | High Distinction |
S0001235 | Sarah Smith | COMP306 | PA | 10 | Pass |
S0002345 | Bobby Brown | COMP306 | PA | 10 | Pass |
S0002234 | Tracy Taylor | COMP306 | CR | 11 | Credit |
S0001244 | Luke Lang | COMP306 | WW | 12 | Fail |
The data shown above might have been extracted from a MySQL database. If that were the case, then the table structure might look something like this:
Field | Type | Attributes | NULL |
---|---|---|---|
Student_ID | varchar(8) | No | |
Student_Name | varchar(30) | No | |
Unit | varchar(7) | No | |
Final_Grade | varchar(2) | ||
Age | tinyint(2) | unsigned | |
Grade_Description | varchar(30) |
For a database to be in First Normal Form, each column must contain only ONE value. This database is not in 1NF because we are storing two values in the Student_Name column. First and Last Name!
The Grade Description stores only one value, even though some descriptions may contain more than one word. Therefore for this database to be 1NF compliant, the table structure would need to look like this:
Field | Type | Attributes | NULL |
---|---|---|---|
Student_ID | varchar(8) | No | |
Last_Name | varchar(30) | No | |
First_Name | varchar(30) | No | |
Unit | varchar(7) | No | |
Final_Grade | varchar(2) | ||
Age | tinyint(2) | unsigned | |
Grade_Description | varchar(30) |
2nd Normal Form
The database must first be in 1NF before you can proceed to the 2NF. In the 2NF, every column in a table that is not a key MUST relate to the primary key. This is a rule that encompasses “The concept of having a table is to allow data to be broken down into logical, smaller, more manageable units.”.
“…address details on the addresses table … registration information (the units you’re enrolled in) on the registration table.” (lecture notes last week).
The above quotes is what 2NF is all about. So if we have a look at our table with the Student_Id
column as the primary key we should be able to see that the database is not 2NF compliant.
Questions:
- What fields are DIRECTLY related to the Student_ID?
- What fields could be ‘pushed’ off onto their own table?
- Looking at repeated sections of data is another clue for creating another table. Can you see where this might be applicable for our 1NF table?
Applying the rules for the 2nd normal form, we could have a database design like this (NB none of the tables are linked yet):
STUDENT_TABLE
Field | Type | Attributes | NULL |
---|---|---|---|
Student_ID | varchar(8) | Primary Key | No |
Last_Name | varchar(30) | No | |
First_Name | varchar(30) | No | |
Age | tinyint(2) | unsigned |
UNIT_TABLE
Field | Type | Attributes | NULL |
---|---|---|---|
Unit_ID | smallint(3) | Primary Key | No |
Unit_Code | varchar(7) | No | |
Unit_Description | varchar(45) |
GRADE_TABLE
Field | Type | Attributes | NULL |
---|---|---|---|
Grade_ID | tinyint(2) | Primary Key | No |
Grade_Code | varchar(2) | No | |
Grade_Description | varchar(45) |
Questions:
- Do the table designs for the 2NF make sense?
- How would you link the tables? (E.g. Do we add a Unit_ID to the Student_Table or add a Student_ID to the Unit_Table?)
Trying to link the tables together as they are doesn’t make much sense. Whether we add a Student_ID
column to the Unit_Table or add a Unit_ID
to the Student_Table; neither option ‘feels’ right. It doesn’t feel right because the relationship doesn’t exist, and we are violating the 2NF principles.
In the case where something doesn’t feel right then we should look at creating a new table that links the 2 tables together.
Registration_TABLE
Field | Type | Attributes | NULL |
---|---|---|---|
Registration_ID | integer | auto_increment (Primary Key) | No |
Student_ID | varchar(8) | Foreign Key | No |
Unit_ID | smallint(3) | Foreign Key | No |
Grade_ID | tinyint(2) | Foreign Key | No |
Primary Keys, as with indexes can be defined for single or multiple columns.
Questions:
- What business rule can be deduced from the definition of the Primary Key on the Registration Table?
- What else might need to be considered for this table?
- What business rule could be deduced if we had no Registration_ID field, and the Student_ID and Unit_ID fields formed the primary key?
SQL Functions
The SQL language also defines a set of functions that can be used when querying the data. The types of functions available to MySQL include:
- Text Functions
- Concatenation and Aliases
- Numeric Functions
- Date and Time Functions (formatting)
- Encryption
- Grouping
The use of functions is generally used with SELECT statements as shown in the examples below.
SELECT Student_ID, Last_Name, MIN(age)
FROM Student_Table
GROUP BY Student_ID
SELECT COUNT(Student_ID)
FROM Student_Table
SELECT age, COUNT(age)
FROM Student_Table
GROUP BY age
Questions:
- Do any of the queries not make sense?
- What sort of data do the above queries return?
Table Joins
When we extract data from a database we will most likely be extracting fields from more than one table. In this situation we will have to JOIN the tables. Tables are usually linked with a Foreign Key on one table with a Primary Key on another.
If we wanted to SELECT the Last_Name
, First_name
(from the Student table) and Unit_Code
(from the Units table) we would need to write an SQL statement that looks like this:
SELECT CONCAT_WS(' ', ST.first_name, ST.last_name) as full_name,
UT.unit_code
FROM Student_Table ST,
Unit_Table UT,
Registration_Table RT
WHERE ST.Student_ID = RT.Student_ID
AND RT.Unit_ID = UT.Unit_ID
In this query we are matching the Student_ID
from the Student_Table to the Student_ID
on the Registration_Table AND matching the Unit_ID
from the Registration_Table to the Unit_ID
on the Unit_Table.
Questions:
- What would happen if we didn’t include AND RT.UNIT_ID = …
- What is a column Alias? What is a table alias, and why would one be necessary?
DML Statements
Not only will we be retrieving information from a database, but we will also be INSERT(ing), UPDATE(ing) or DELETE(ing) records from a database. In general, the same sort of queries that were written for single table SELECT statements can be modified for INSERT, UPDATE or DELETE.
Here are some DML examples.
UPDATE student_table
SET age = 12
WHERE student_first_name = 'Brent';
DELETE FROM registration_table
WHERE unit_id = 2;
INSERT INTO grade_table
VALUES (0, 'VHD', 'Very High Distinction');
Questions:
- What do each of the DML statements do?
- How would we write an INSERT statement that didn’t need to insert data into every field?
- Why is the WHERE clause so important in the UPDATE and DELETE statements?
- What preventative measure could you perform before executing an UPDATE or DELETE statement?
PHP and MySQL
When we combine variables from POST or GET with our queries, then we can start to build some really useful web pages. Have a look at this modified version of the previous SQL statements.
if(isset($_POST['submit']))
{
if(!empty($_POST['last_name']))
$last_name = $_POST['last_name'];
else
$last_name = FALSE;
if(!empty($_POST['age']) && is_numeric($_POST['age']))
$age = $_POST['age'];
else
$age = FALSE;
if($last_name && $age)
{
$query = "UPDATE student_table
SET age = $age
WHERE student_last_name = '$last_name'";
$result = mysqli_query($dbc, $query);
if($result)
echo '<p>Information Updated Successfully</p>';
else
echo '<p>ERROR: Could not update</p>';
}
else //-- if($last_name && $age)
echo '<p>The last name and age field must be ...</p>';
} //-- if(isset($_POST['submit']))
Questions:
- What assumptions have been made with regards to the POST variables?
- Why is
$last_name
in quotes, but not$age
in the query?