Isn't Using Unnormalized Design Better When There Are Multiple Joins?
Solution 1:
The second approach uses two JOINs. I guess it will be slower than using REGEXP in huge dataset.
Your intuition is simply wrong. Databases are designed to do JOINs. They can take advantage of indexing and partitioning to speed queries. More advanced databases (than MySQL) use statistics on tables to choose optimal algorithms for executing the query.
Your first query always requires a full table scan of posts
. Your second query can be optimized in various ways.
Further, maintaining the consistency of the data in the data is much more difficult with the first approach. You probably need to implement triggers to handle updates and inserts on all the tables. That slows things down.
There are some cases where it is worth the effort to do this -- think about summary counts or totals of dollars or time. Putting tags into a delimited string is much less likely to be beneficial, because parsing the string in SQL is not likely to be a really big benefit relative to the other costs.
Solution 2:
In small tables, you can use both at your discretion.
If you expect the table to grow, you really need to second choice. The reason behind is that The regexp can never use an index in MySQL. And indexes are the key to fast queries. join will use an index if an index is declared on the column;
Solution 3:
All these look good when we talk about data in lower scale. It's very fundamental theory for an OLTP system to have denormalize tables. When you expect your table to scale and want data to be non-redundant and consistent, normalization is the answer. Of course there are costs involved with join but thats trivial with all these issues. Lets talk about your scenario: Pros:
- all data available querying one table.
Cons:
- function wrapped across columns force query optimizer to scan the whole table irrespective of the column index. This is very important from data scaling point of view.
- Keyword in your case repeated multiple time leading data redundancy.
- Keywords appear multiple times lead to data inconsistencies, if you want to remove/update a keyword, it requires column to be searched and replace everywhere from each row. And if anycase anywhere the keywords left behind, leads data integrity issues.
There are many more. Go through data normalization in RDBMS.
Post a Comment for "Isn't Using Unnormalized Design Better When There Are Multiple Joins?"