summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKjetil Orbekk <kjetil.orbekk@gmail.com>2017-10-28 08:50:27 -0400
committerKjetil Orbekk <kjetil.orbekk@gmail.com>2017-10-28 08:50:27 -0400
commit5ac895c31a1070d8b6ebef4e094d45bfe0cceaf7 (patch)
tree4e3497f5d0e85d788f8aa18e1e0eb1a9a1e8c60c
parentdabf4a2600206307e69d2ff8525b11dea600ad90 (diff)
DB normal form example
-rw-r--r--db.Rmd80
1 files changed, 80 insertions, 0 deletions
diff --git a/db.Rmd b/db.Rmd
new file mode 100644
index 0000000..5df6a06
--- /dev/null
+++ b/db.Rmd
@@ -0,0 +1,80 @@
+# First normal form
+
+A relation is in first normal form if and only if the domain of each attribute
+contains only atomic (indivisible) values, and the value of each attribute
+contains only a single value from that domain.
+
+```{r result='asis'}
+library(data.table)
+library(knitr)
+
+tasks <- data.table(
+ task_id = 1:3,
+ description = c('Buy milk', 'Clean bedroom', 'Make coffee'),
+ owner = c('Alice,Bob', 'Bob', 'Alice')
+)
+setkey(tasks, task_id)
+
+kable(tasks)
+```
+
+The `tasks` table is not in 1NF because there can be multiple owners
+of a task.
+
+```{r result='asis'}
+owners <- tasks[, .(owner=unlist(strsplit(owner, ','))), by=.(task_id)]
+setkey(owners, task_id, owner)
+
+kable(owners)
+```
+
+# Second normal form
+
+A relation is in 2NF if it is in 1NF and no non-prime attribute is dependent on
+any proper subset of any candidate key of the relation.
+
+```{r result='asis'}
+task_dependencies <- data.table(
+ task_id = 1:2,
+ depends_on = 3,
+ dependent_target_date = as.POSIXct(c('2017-10-29', '2017-11-30'))
+)
+setkey(task_dependencies, task_id, depends_on)
+kable(task_dependencies)
+```
+
+This is not 2NF because `dependent_target_date` is dependent on a subset of the
+key (`depends_on`). Fix by moving target_date to the tasks table.
+
+```{r result='asis'}
+target_dates <- as.POSIXct(c('2017-10-29', '2017-11-30', '2017-10-28'))
+kable(tasks[1:3, target_date := target_dates][])
+kable(task_dependencies[, dependent_target_date := NULL][])
+```
+
+We can still get the target dates by joining:
+
+```{r result='asis'}
+kable(tasks[task_dependencies, on=.(task_id = depends_on)])
+```
+
+# Third normal form
+
+(1) the entity is in second normal form, and (2) all the attributes in a table
+are determined only by the candidate keys of that relation and not by any
+non-prime attributes
+
+```{r result='asis'}
+tasks[1:3, `:=`(category = c('shopping', 'house work', 'cooking'),
+ category_location = c('in a store', 'home', 'home'))]
+kable(tasks)
+```
+
+Not 3NF because category_location is determined by category, which is
+not the primary key.
+
+```{r result='asis'}
+categories <- tasks[, .(category, location = category_location)]
+kable(categories)
+kable(tasks[, `:=`(category = NULL)][])
+``` \ No newline at end of file