An Overview of Database Concepts
DB for Dummies Linguists
John Remmers
Computer Science Dept., EMU (emeritus)
Relational database concepts
- Entity/relationship data modeling
- Tables
- Keys
- SQL (Structured Query Language)
LINGUIST List data
Let's look at a typical LINGUIST message:
LINGUIST List 16.2104, Msg #1.
Attributes of a message
- Issue data: volume #, issue #, date, topic, subject, message #
- Editor data: name, email, webpage, affiliation
- Author data: name, email
- Message data: subject, date, body
- Linguistic field data
Typical queries
- List all Conf messages posted by EMU editors in February 2004.
- Display a table of the number of messages on each topic
in LINGUIST volume 10.
- List all Summaries messages on psycholinguistics.
- List authors of all messages whose bodies contain the term
context-free grammar.
Tables
- Data in a relational database is stored in tables.
- Tables have a row-column structure.
- Rows are called records,
columns are called fields.
- Each record contains data for an entity (e.g. a LINGUIST message)
- Each field corresponds to an attribute (e.g. a message subject)
A BAD design for LINGUIST
Have one table. Each record represents one message.
The fields correspond to the attributes identified previously.
Why is this a bad idea?
- Duplication of data.
- Wastes space.
- Difficult to modify.
Need to decompose into separate tables -
message table, editor table, etc.
Decomposition
- Need to decompose our monolithic table into multiple
tables. (e.g. message table, editor table, etc.)
- How do this without losing information?
- KEYS enable linking of information in different tables.
Keys
- A primary key is a field that uniquely identifies
a record.
- A foreign key is a field that references the primary
key field of another table.
- Examples of primary keys:
- Issues: Concatenate volume and issue number, e.g. for issue 10.2014
a primary key would be 01002014
- Editors: Use their linguistlist.org login id, e.g. susan, maria,
kevin, etc.
- See this example.
LINGUIST List Entities
- MESSAGE
- ISSUE
- EDITOR
- AUTHOR
- LINGFIELD
- TOPIC
Relationships
- ISSUE contains MESSAGES (one-to-many)
- EDITOR edits ISSUES (one-to-many)
- AUTHOR submits MESSAGES (one-to-many)
- TOPIC labels ISSUE (one-to-many)
- LINGFIELDS pertain to MESSAGES (many-to-many)
From E/R to Tables
Construction rules:
- Create a table for each entity.
- one-to-many: Use a foreign key in the "many" entity
that refers to the "one" entity
- many-to-many: Create new table with foreign keys for
both entities
Database Schema for LINGUIST Message Data
- TOPIC: (topic_id, topic_descrip)
- (TOC, "Journal tables of contents")
- EDITOR: (ed_id, ed_name)
- AUTHOR: (auth_email, auth_name)
- (aristar@linguistlist.org,
"Anthony Aristar")
- LINGFIELD: (lf_id, lf_descrip)
- (ETlinguistics,
"The study of extraterrestrial languages")
- ISSUE: (issue_id, issue_date,
issue_subject,
ed_id,
topic_id)
- (01602005, 07092005,
"Applied Ling: Asst/Assoc Prof, Portland State U",
vanessa,
Jobs)
- MESSAGE: (msg_id, msg_date, msg_subject,
msg_body,
issue_id,
auth_id)
- (01602005001, 06092005,
"Applied Linguistics: Asst/Assoc Prof, Portland State University,
OR, USA", [message body],
01602005,
tittelk@pdx.edu)
- MSG_PERT_LF: (msg_id,
lf_id)
- (01602584001,
Sociolinguistics)
Notes and Acknowledgements
This slide show can be viewed on the web at
jremmers.org.
It was constructed using Eric Meyer's web-based
S5 Slide
Show system S5 is free, open source, and in the public domain.