30

I used to label columns in my databases like this:

user_id
user_name
user_password_hash

To avoid conflicts when joining two tables, but then I learnt some more on how to alias tables, and I stopped doing this.

What is an effective way of labeling columns in a database? Why?

Brian Ballsun-Stanton
  • 4,683
  • 2
  • 27
  • 36
Thomas O
  • 501
  • 5
  • 8
  • Which database? How I label in Oracle is different from most other databases due to it's feature of automatically selecting columns to base joins on if the names match. – Joe Jan 03 '11 at 21:12
  • @Joe, Well, I've always used MySQL and SQLite3, but it should apply to most other databases. – Thomas O Jan 03 '11 at 21:28
  • @joe never noticed that Oracle is different. Can you give a link? – bernd_k Jan 04 '11 at 06:41
  • @bernd_k : I've added some links to [my answer](http://dba.stackexchange.com/questions/3/how-do-you-label-columns/45#45), below – Joe Jan 04 '11 at 14:44

10 Answers10

33

In your case, the prefix user is redundant. We (the devs in charge) know that this is the table user, so why add user_ prefix in front of every field?

What I would suggest to you is to do it with a more natural approach.

What are the characteristics of a Person: Last Name, First Name, Birthdate, Nationality, etc...

What are the characteristics of a Car: Model, Year, Color, Energy, etc...

Your column should be named as natural as possible, it would make the schema more clear for everybody, for you and the ones coming after you. This is also called the Maintenance phase, and anything you can do to make the maintenance easier is usually worth the effort.

jcolebrand
  • 6,168
  • 4
  • 40
  • 65
Spredzy
  • 2,248
  • 1
  • 20
  • 25
  • 1
    Yes, it infuriates me when people do that. Also when they call all their table tbl_whatever. – Gaius Jan 10 '11 at 13:31
  • This is also relevant to the concept of "Class Words", and there seems to be some debate in the community when Class Words are and aren't appropriate. (a class word is a tool to: Identify a distinct category or classification of data, Delineate the type of data being described by the data name, and Describe the major classification of data associated with a data element.) – Jon Schoning Apr 29 '11 at 17:50
17

In addition to Spredzy's comment, label your primary keys the same ( ID ) so that when you're writing queries on the fly, you can easily recall (u.ID=c.ID) instead of having to look up "Was it countryID, country_ID, countries_ID, countriesID, ?"

David Hall
  • 2,187
  • 16
  • 11
9

I couldn't agree more with David Hall's addendum to Spredzy's excellent answer. Simple and natural is the way to go. Table confusion should not be an issue if you name tables naturally too.

No sense having users.user_id and cars.car_id when you could have users.id and cars.id

bsoist
  • 391
  • 1
  • 5
7

I would argue that in a database schema, every column should have a unique name, across tables. There are several reasons for that:

  • From a modeling point of view: You start with a soup of attributes and you normalize it down into tables. Over time, you might denormalize or normalize further or introduce views or materialized views, or introduce new tables. This is never a problem if all column names are unique.

  • You can use this join syntax: a JOIN b USING (a_id) JOIN c USING (a_id). Very convenient and also helps with the following point.

  • If you run queries with lots of joins or create materialized views with SELECT *, you will never (well, maybe rarely) a have a conflict. Think about joining person.name, product.name, country.name, etc. Urgh.

  • In general, if you have big queries, it's hard to keep track of what id means everywhere.

Peter Eisentraut
  • 9,473
  • 1
  • 29
  • 33
7

Let's see, with your example it will look something like this:

USERS
----
id
username,
password
registration_date

I use the table name in uppercase. This lets me identify the table easily. The columns I just named is each for what it represents. I try not to use numbers or include any prefix or suffix with it. This will makes the queries dead simple and pretty straightforward.

BTW, I think you should find some style you like and stick with it. If you change it often, then you will have a messier DB schema.

RolandoMySQLDBA
  • 171,728
  • 30
  • 293
  • 486
eiefai
  • 1,834
  • 2
  • 21
  • 18
  • +1 for "find some style you like and stick with it." Consistency is better than exactly complying with any particular standard (though if you have not yet chosen a standard, some are better than others). – Jon of All Trades Oct 04 '11 at 21:10
5

I'm working in an environment where each column name starts with a prefix derived from the table name, it is not my invention, but I'm quite happy with it.

Ideally column names are unique over all tables in the database.

Some observations:

  • we only need table aliases, when tables are joined multiple times in a select statement
  • it prevent some faults when copying code snippets, because column names must be adapted to the table name
  • it helps to show to which table a foreign key column points

General ideas: Most important is the consistency of each naming conventions: - singular vs. plural (ok that applies to tables and not columns) - identify primary and foreign keys (they build the structure vs the content of the database) - be consistent when you store strings and short variant of the same string - be consistent with flags, status etc.

bernd_k
  • 11,741
  • 23
  • 73
  • 108
5

Like the others, I recommend that you do not include the table name as part of the column. Unless you've got hundreds of tables all with mostly similar column names: if you have multiple dozens of tables all with a column titled ID, then by all means prefix them with the table name.

I recently left a company where one of the developers preferred to prefix primary key and foreign key columns with pk and fk. This lead to some abominations where columns started with pkfk (usually a composite primary key based on 2 columns, of which one column was a foreign key to another table).

Tangurena
  • 1,666
  • 1
  • 14
  • 17
3

I agree with Spredzy's answer but would add that as a matter of preference I would use camelCase instead of under_score.

firstName, lastName etc.

Toby
  • 1,128
  • 2
  • 12
  • 11
  • 2
    -1 because CamelCase doesn't work in all database systems and you didn't specify a database system. For instance, its bad news to use CamelCase in Oracle (it would require using double-quotes to create it but from then on, everyone accessing it would have to jump through hoops to access/use it). What a nightmare. – ScottCher Jan 04 '11 at 20:50
  • @ScottCher - I didn't know that it doesn't work in Oracle, but then I am not an Oracle DBA. I would have thought it would be taken as a given that the column names need first comply to the rules set out by the DBS in question. – Toby Apr 20 '11 at 08:32
3

In the case of Oracle, you'll want to not name columns 'id' or 'name' or anything generic.

The issue is that by default in older versions, Oracle will attempt to join tables based on similar column names, so if I've named everything well, then I've also ended up specifying the default join clause between my tables.

But even if you're not using Oracle, by not chosing names that appear in multiple tables, it also means that you don't then have to go through the trouble of aliasing every time you have to do a select across two tables:

SELECT
  instrument.name as instrument_name,
  instrument.abbr as instrument_abbr,
  source.name     as source_name,
  source.abbr     as source_abbr,
  ...
FROM ...

So, if multi-table selects are the norm, longer column names save you typing. (if you're only using one table at a time ... do you really even need a relational database?)

... and saving typing brings us to another issue in Oracle -- at least in 8i (the current version when I took the Oracle SQL Tuning and Data Modeling courses) caching of execution plans is based on only the first so many characters of the query (can't remember the exact value ... 1024?), so if you have queries that only varies by something all the way at the end of the where clause, and a really long list of columns you're extracting, you can run into a performance hit as it can't cache the execution plan correctly.

Oracle had a guide on selecting what they claim are good table and column names, which is basically a guide for removing letters 'til it's about 5-8 characters, but I never much cared for it.

...

As things go other than that:

  • columns are always singular (tables are always plural)
  • all names are lower case, just in case there's something case-sensitive
  • as a result of the above, use underscores instead of camel case.

update : for those not familiar with Oracle's join behavior, see the last example on Mastering Oracle SQL : Join Conditions, where it mentions:

What happened? The reason lies in the fact that, aside from supplier_id, these two tables have another pair of columns with a common name. That column is name. So, when you ask for a natural join between the supplier and the part tables, the join takes place not only by equating the supplier_id column of the two tables, but the name column from the two tables is equated as well. Since, no supplier name is the same as a part name from that same supplier, no rows are returned by the query.

Under 'old join syntax' (8i and earlier), 'NATURAL JOIN' was the default join behavior, and I believe it still is if you don't specify a join condition. Once 'NATURAL JOIN' was an official option in 9i, the general recommendation was don't use it, because bad column naming can screw you up, which is my I'm advocating for good column names.

Joe
  • 4,909
  • 1
  • 26
  • 37
  • 4
    You are referring to "Natural Joins" in your second paragraph? If so *SHUDDER*... Whenever possible, you should specify how you want your database system to join your tables. Leaving it up to the database to decide can produce unexpected/inconsistent results. In addition, Natural Joins are limited to joins between two tables and thus are relatively limited in their usability. – ScottCher Jan 04 '11 at 21:30
  • @ScottCher : I *am* specifying it -- by how I name the columns. And I'm not sure if the 'old join syntax' has the same restriction on two tables that specifically calling 'NATURAL JOIN' has ... we finally got rid of our last Oracle install ~2 years ago. – Joe Jan 04 '11 at 22:04
  • 2
    NATURAL JOIN has never been the default. If no explicit join is/was given, a cartesian join would be done (ie each and every row in a table joined to each and every row in the other table). Prior to ANSI joins being supported (ie those specified in the FROM clause) joins had to be done in the WHERE clause. – Gary Jan 11 '11 at 04:42
  • 1
    -1 for natural joins. When an unrelated schema change can break joins, or worse yet, change them without causing any errors, you're in for a world of pain. Please, think of the children, and ALWAYS specify your join fields. – Jon of All Trades Oct 04 '11 at 21:16
  • @JonofAllTrades +infinity for the "worse yet" bit - if only you had a few more answers I could go round and upvote :) – Jack Douglas Oct 26 '11 at 07:15
  • +1 for colun naming that makes `NATURAL JOIN` a viable option. @JackDouglas: feel free to downvote all my stuff on StackExchange for my stance ;) – onedaywhen Oct 26 '11 at 12:52
  • 2
    @ScottCher: "Leaving it up to the database to decide" -- first, presumably you mean "DBMS" rather than "database". Second, there is no AI or anthropomorphistic mechanism in Oracle; rather, `NATURAL JOIN` is deterministic. – onedaywhen Oct 26 '11 at 12:58
  • @onedaywhen just a figure of speech of course regarding the voting ;) The main problem with this answer is the "by default, Oracle will attempt to join tables based on similar column names" which is completely untrue, as is "Under 'old join syntax' (8i and earlier), 'NATURAL JOIN' was the default join behavior" – Jack Douglas Oct 26 '11 at 13:08
  • @Jack : I've clarified that it was behaviour in older versions. But it's still a pain in the ass to have to join in 5 lookup tables that all have the columns 'name' and 'abbr', so you have to re-alias them every damned time. And your 'figure of speach' is looked upon rather poorly in this community. As a moderator, you should know that there's even a report of people disproprtionately voting. (or at least there used to be when I had moderator access) – Joe Oct 26 '11 at 13:28
  • @Joe Ok, sorry about the figure of speech. I only vote when I think the Q/A is good and of course I hope everyone else does the same – Jack Douglas Oct 26 '11 at 13:37
  • 1
    @Joe `cross join` is, was and always will be the 'default'. Oracle has *never* matched on column name unless `natural join` was explicitly used – Jack Douglas Oct 26 '11 at 13:38
  • @JackDouglas: cross join is a specialized natural join! – onedaywhen Oct 26 '11 at 13:49
  • 1
    @onedaywhen all joins are specialized cross joins ;) – Jack Douglas Oct 26 '11 at 14:06
1
  1. Never use double-quotes " because in doing so, you override the native case-folding of the database. The SQL spec demands all identifiers be folded to upper case. Some databases, like PostgreSQL fold them to lower case. If nothing is quoted, it'll work in all databases and they can fold them to the spec or the rdbms-specific default.
  2. Use an under_score (_), because as per above -- you shouldn't use camelCase.
  3. use {entity}_id for ids (and foreign keys pointing to those ids). Because then you can use the USING clause. The globally-unique key names used in join-conditions is a convention established in the spec.

    SELECT *
    FROM employee
    INNER JOIN department
      USING (department_id);
    
      -- compare to
      ON employee.department_id = department.department_id;
    
NO WAR WITH RUSSIA
  • 54,954
  • 34
  • 200
  • 411
Neil McGuigan
  • 7,653
  • 3
  • 36
  • 52