21 Apr 2017

Database design: rows vs columns

I'm developing an app and it requires me to design the database. I'm wondering what'd be more optimal design in following scenario:

Approach 1:

Have one user table with all the user fields viz.
    id | uid | username | first_name | last_name | profession
OR

Approach 2:

Table I:

id | uid | username

Table II:
    uid | key | value |
    1   | 'first_name' | John
    2   | 'last_name'  | Donald and so on

The first approach favours more columns to store the user data, while the second approach relies on multiple tables and stores data into several rows for each user.

The second approach would mean that for each user, the user_meta table will have large number of rows while approach #1 will be more compact.

Questions:

1. Which approach is better in terms of performance and speed of queries?
2. Is there any rule for designing the database where you've to decide whether to store the data in rows vs columns?
21 Apr 2017
Tagging @Prasad Ajinkya and @simplycoder . Have you ever faced such design decision? What should be the thought process while determining the DB structure?
simplycoder

simplycoder

Branch Unspecified
23 Apr 2017
Kaustubh Katdare
I'm developing an app and it requires me to design the database. I'm wondering what'd be more optimal design in following scenario:

Approach 1:

Have one user table with all the user fields viz.
    id | uid | username | first_name | last_name | profession
OR

Approach 2:

Table I:

id | uid | username

Table II:
    uid | key | value |
    1   | 'first_name' | John
    2   | 'last_name'  | Donald and so on

The first approach favours more columns to store the user data, while the second approach relies on multiple tables and stores data into several rows for each user.

The second approach would mean that for each user, the user_meta table will have large number of rows while approach #1 will be more compact.
I think I would rather go with approach 1 and normalize the data till it satisfies 3-NF.
I would prefer the compact table design for following reasons.
  1. Creates entity based tables which are easy to understand and data can easily and intuitively be transferred from server side to database without changing the structure of the data drastically.
  2. Will result into simpler queries.
  3. A Non clustered would be required on meta data table.
  4. Joins would be costly if data is required from both meta data table and main table.

Approach 2 might not be good when it comes to joins and will require extra efforts to display the correct table from data (Adding loops to pivot the table, which can be done in db, but not intuitive).

Questions:

1. Which approach is better in terms of performance and speed of queries?
Well approach 1 is not bad in terms of performance for sure. Create appropriate indexes, follow best practices for querying data, with current RDBMS engines and computing power, it would not be much of a concern. Performance is something which I would believe only after measuring. Best way to find a bottle neck is to benchmark it.
2. Is there any rule for designing the database where you've to decide whether to store the data in rows vs columns?
Personally I would benchmark both of them by filling in with entries, I would consider the size of the table and the page size and other constraints. In this specific case, I am unable to see the benefits by adopting approach 2.
Anoop Kumar

Anoop Kumar

Branch Unspecified
24 Apr 2017
If table I and II going to be one-one relation.
There is no repetition of ID field. Better to keep it in one table. Even if some of meta-values will be blank.
We are going to fetch first,last and profession mostly in one go better to make it column and modal as object.
In 2nd approach, converting column to row looks costly. As you have to match each row if it First_name or last_name.
Converting column as row would better when there is uncertainty of column name (Property value). In that case instead of using table structure Json object may be considered. Most of latest DB are supporting native JSON processing.

id | uid | username | first_name | last_name | profession | json_metadata
24 Apr 2017
I think it's all about flexibility. As @simplycoder said, having multiple tables requires JOINS to fetch records, which could be expensive when the count of records is huge. On the other hand, it gives us the freedom to inject any record without touching the table design.

Has anyone ran into issues when the count of rows grows to over a million?
Anoop Kumar

Anoop Kumar

Branch Unspecified
24 Apr 2017
Kaustubh Katdare
Has anyone ran into issues when the count of rows grows to over a million?
Yes. I have faced it.
If you are expecting millions of rows. There should be minimal join. Using joins makes code/data more maintainable but run time cost is unbearable.
Now a days we don't have storage problem, we need performance.
That's why a non-archive db is moving to object based like JSON. No wastage space issue and best query performance.
Prasad Ajinkya

Prasad Ajinkya

Branch Unspecified
09 May 2017
While theoretically I might agree with @simplycoder ... in the world where prototypes and product apps get built overnight and we have to continually keep shipping - this requires a different thought.

Biggie you are asking the right questions.

3NF is useful if your table structures are going to get complicated. Even then if the number of users are many, then I might just prefer a single table over the lot.

However, if I need a more flexible schema (where even I dont know what attributes the user is going to have), then the second structure is more solid. Case in point WordPress - it uses the second approach for users and posts. This allows for a much more flexible CMS allowing for rapid customization
09 May 2017
I've been playing with Laravel and it's my first 'proper framework'. So far, I have not been able to form an opinion on which approach is better. When I've to make queries multiple times, it looks like I'd rather expand horizontally. Perhaps, a user table may have anywhere between 10 to 100 columns, each dedicated to specific field. This way, I can avoid JOINS. Not that I'm against joins; but it looks like some unnecessary extra work just to fetch data that could have been pulled in from columns.

On the second hand, I can opt for flexibility and keep adding rows for every field I want in a separate table. As @Prasad Ajinkya said, this would be ideal when I'm not sure about the kind of data user might want to add.

Now, that makes me wonder; what's better in following scenario:

I've 100,000 users and I decide on adding location data ( city, state, country ). I can -
  • Add three columns to the user database as 'city' | 'state' | 'country' OR
  • Create 300,000 new rows in the database to store city , state and country.
You may argue that I could perhaps serialize the data; but that's still creating 100,000 additional rows in the database.

Waiting for comments.
Prasad Ajinkya

Prasad Ajinkya

Branch Unspecified
09 May 2017
The first approach is Relational, and it won't hurt to implement if you know how you are going to access and massage that data 😀

The second approach is Hierarchical, and its a step closer to NoSQL based architectures. This prefers flexibility in structures over well defined structure.

Your example of a 100,000 students is good to illustrate this. If I have a sparsely filled db, then the Hierarchical approach uses less rows, and would be fast. However, if you make users fill this and wish to create reports that will slice and dice the student data, then the relational model (first approach) works better.

PS - if you are using Laravel, you will find the first approach better to use.
Sigma Robertson

Sigma Robertson

Branch Unspecified
07 Jun 2017
Thank you He is my Guru who helped me to build my website and grow it.sigma1x2
Prasad Ajinkya
The first approach is Relational, and it won't hurt to implement if you know how you are going to access and massage that data 😀

The second approach is Hierarchical, and its a step closer to NoSQL based architectures. This prefers flexibility in structures over well defined structure.

Your example of a 100,000 students is good to illustrate this. If I have a sparsely filled db, then the Hierarchical approach uses less rows, and would be fast. However, if you make users fill this and wish to create reports that will slice and dice the student data, then the relational model (first approach) works better.

PS - if you are using Laravel, you will find the first approach better to use.
06 Sep 2018

Update:

It's been a while I asked this question and now have found a convincing answer. In short - whether you choose to distribute your data across rows or across tables depends upon the data itself - and how frequently you wish to retrieve it. 

Consider a User model. 

A user has typical data like 'email', 'phone', 'password', 'city', 'state', 'country' etc. It'd be safe to assume that in about 90% of the requests that we make to the application; User model needs to be accessed. 

Ask yourself this question - are you going to need to retrieve the User->city in every request? 

Second, 

Are you going to sort or fetch users by city? 

Think about it. It's far easier to simply pull a user model and access its city than fetching a relationship [ User->belongsTo->City]. 

In the second case, we're introducing a relationship when it's really not needed. Of course, I'm taking a 'general overview' here of typical requests. It'd be far easier to have an extra column on User table to store 'City'; and make it 'nullable'. 

Second, there are cases when you are forced to distribute your data in a normalized fashion. 

For example, consider that you need to record activities of any user. You're going to need a separate table to record activity of the user; so that you can retrieve it by searching for 'user_id' column on the 'activity' table. You can then make use of the [ User->hasMany->Activity ] relationship.

The Database Engines Are Faster Than Ever!

Thanks to the Solid State Drives (SSDs) that read and write operations are far smoother and faster than a few years ago. This means, performance ain't a question even if you have millions of rows. 

A simple 1GB server can easily process a database with a million rows without any issues; provided your queries are optimal and don't lead to memory leaks. 

I hope this helps. If someone is struggling with their database design, post your question below.

Share this content on your social channels -

Only logged in users can reply.