1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
# First normal form
A relation is in first normal form if and only if the domain of each attribute
contains only atomic (indivisible) values, and the value of each attribute
contains only a single value from that domain.
```{r result='asis'}
library(data.table)
library(knitr)
tasks <- data.table(
task_id = 1:3,
description = c('Buy milk', 'Clean bedroom', 'Make coffee'),
owner = c('Alice,Bob', 'Bob', 'Alice')
)
setkey(tasks, task_id)
kable(tasks)
```
The `tasks` table is not in 1NF because there can be multiple owners
of a task.
```{r result='asis'}
owners <- tasks[, .(owner=unlist(strsplit(owner, ','))), by=.(task_id)]
setkey(owners, task_id, owner)
kable(owners)
kable(tasks[, owner := NULL])
```
# Second normal form
A relation is in 2NF if it is in 1NF and no non-prime attribute is dependent on
any proper subset of any candidate key of the relation.
```{r result='asis'}
task_dependencies <- data.table(
task_id = 1:2,
depends_on = 3,
dependent_target_date = as.POSIXct(c('2017-10-29', '2017-11-30'))
)
setkey(task_dependencies, task_id, depends_on)
kable(task_dependencies)
```
This is not 2NF because `dependent_target_date` is dependent on a subset of the
key (`depends_on`). Fix by moving target_date to the tasks table.
```{r result='asis'}
target_dates <- as.POSIXct(c('2017-10-29', '2017-11-30', '2017-10-28'))
kable(tasks[1:3, target_date := target_dates][])
kable(task_dependencies[, dependent_target_date := NULL][])
```
We can still get the target dates by joining:
```{r result='asis'}
kable(tasks[task_dependencies, on=.(task_id = depends_on),
.(task_id, depends_on, target_date)])
```
# Third normal form
(1) the entity is in second normal form, and (2) all the attributes in a table
are determined only by the candidate keys of that relation and not by any
non-prime attributes
```{r result='asis'}
tasks[1:3, `:=`(category = c('shopping', 'house work', 'cooking'),
category_location = c('in a store', 'home', 'home'))]
kable(tasks)
```
Not 3NF because category_location is determined by category, which is
not the primary key.
```{r result='asis'}
categories <- tasks[, .(category, location = category_location)]
kable(categories)
kable(tasks[, `:=`(category_location = NULL)][])
```
|