2

I'm currently setting up a database to house records for an application I'm writing and am trying to come up with the most efficient DB structure to use... Of course I know that there are as many answers to this as there are people with opinions, but there has to be some kind of logic as to what to do for different situations.

So, as a simple example of my situation. Suppose I was creating the DB in, say SQL Server, to store data about people. Currently there are is a set of, say, 100 characteristics I am storing per person (Some static such as, say, name, sex, etc - Some RELATIVELY static such as age, address, height - and some variable, sich as, say, weight, last_worn_pice_of_clothing... whatever) and I also want the architecture to be flexible enough to be able to accommodate any future characteristics that may come about (this won't happen "too often", but will happen).

So, my logic is that I could go with one of the following storage ideas:

(1)

People_Chars_Table:

Person_Id:    RunDate:    Name: ...<all the other chars> ... Char100:
1             2014-01-01  John Johnson ....                  Val100

In other words, store all the characteristics in a very wide table and every time I have a new characteristic to add, I simply add on a new column.


(2)

Person_Table:

Person_Id:

Characteristic_Table:

Char_Id:    Char_Name:    Char_ValueType:
1           Name          String
2           Age           Integer
....

People_Table:

Person_Id:    RunDate:     Char_Id:     Char_Value:
1             2014-01-01   1            John Johnson
1             2014-01-01   2            23
....

What (2) seems to allow for is for characteristics to be added very quickly and easily, but it seems like querying out the data will always involve some kind of pivot operation that might slow things down (The 2 things that might also be important to know, by the way, is that this data will be queried fairly often and that the table would have about 100000 new people records added monthly - each with about 100-150 characteristics).

And, then, of course, there is a hybrid option of the person table having some of the pre-set characteristics and then using the rest from option (2).

Overall, I'm just wondering if there is any logic about what is the best practice or, aside from these 2 options I'm seeing, is there another way of handling this completely?

Any advice would be greatly appreciated!!! Thanks!!!

John Bustos
  • 121
  • 3
  • 4
    Write some queries and some DML and see which one you prefer. You'll quickly discover that they're much easier to write with ordinary "wide" tables. They perform better, too. And you can't add database constraints when you use your "long" approach. The "long" approach has a name: Entity-Attribute-Value (EAV) and it's an anti-pattern. – Colin 't Hart Jun 16 '14 at 18:21
  • 2
    I agree w/Colin Hart, here's a link that stopped me from doing the EAV. http://stackoverflow.com/questions/2403387/table-design-for-systemsettings-best-model – Chris L Jun 16 '14 at 18:56
  • I completely agree with @Colin'tHart too... Thank you so much for the link - Just couldn't find it myself!!! – John Bustos Jun 16 '14 at 20:53
  • 1
    Joe Celko, a man who knows a thing or two about databases and SQL, calls what you're doing the "Massively Unified Code Key" approach. The acronym is "MUCK" and there's a good reason for that! – Vérace Jun 16 '14 at 22:49

0 Answers0