Why NOT to use umbraco as a data store

Umbraco logo

Umbraco logo

I have recently made a great fuzz on twitter by saying I don’t want people to use the content section of umbraco as a data store.

A lot of people did not agree and then the debate began. The feedback was so overwhelming neither me, nor my coworkers could keep track on who, what and where the debate took place. So I decided to make this blog post.

So, my initial statement is still valid. “No matter what you think, do not use umbracos content section as a data store”.

First off, what is data and what is content? Well it’s a bit fuzzy but this is my definition: Any information that is to be considered as metadata, (product information, member profiles, log entries, values for lists and so on and so fourth), is to be considered data and therefore NOT content.

All of that data has to be stored in a real data store. Said data store would I prefer being a database. And since we are using umbraco, and .NET, why not adding a couple of extra tables in our already setup database?

By using your own data store, you get a lot of possibilities. Possibilities umbraco do not offer. In your own data store, you can organize your data the way your data should be organized. The umbraco data store is build solely to deliver content, not ecommerce, not to serve as an ERP or as a CRM. Umbraco is a CMS, it is build to deliver content. You can even cache your data as you like. And why not use lucene for that? It comes with umbraco, and it is ridiculously fast, and you can index anything you want and in any way you want. The indexes and caching mechanisms used by umbraco is build to deliver content, it is build to serve umbraco content, not ecommerce content or ERP content, only umbraco content.

So if you are to build an ecommerce solution, or a CRM. Use your own data store. This way, you don’t force umbraco into being something it is not. You make it way easier for your users to distinguish umbraco from you solution, and in case of extending or rebuilding parts of your solution, you don’t have to force umbraco into submission, you only have to change your own code.

So lets say you do use umbraco as a data store, what problems are there: Well, as stated earlier, you are then forcing umbraco into being something it is not, you are cluttering umbraco.config with non-content related information and you are making an enormous content tree. You are furthermore cluttering the data base, we all know that any database gets slower the more data you have to out into it, and by using umbraco as a data store you are adding A LOT OF DATA!

Lets take an example, (this is a bit simplified, I know): A customer has five properties: ID, Name, Address, city and postal code. In my database: one table, five fields, each customer: one row. In umbraco this translates to: One doc type (1 row), four properties (4 rows) and one customer becomes 5 rows in two tables: 1 row for the node, and one row for each value, and I haven’t even included history information.

This means that the database has to handle five times the amount of data by using umbraco vs. a custom table, that is bad! “But hey! Umbraco has it’s XML caching and lucene to help that load!” Yes, but the XML cache and lucene won’t help much when we need to make CRUD actions in our data base. I have seen very large umbraco trees, and they are very heavy. I have seen loading times around five minutes. Not on the end user side, no in the back office. And yes, people do use the back office.

Umbraco also has a very strict set of rules. Umbraco is build as a tree, meaning you have a parent and a lot of children. I would love to see how you would handle a many to many relation, like product <–> category, using umbraco. In a database, I would add one table with two primary keys, set as foreign keys to product and category.

I usually use the argument of scalability. A database is build to be scalable, and in my opinion it is much easier to add an extra table and a couple of extra fields, than it is to create an extra doc type and a couple of extra fields. It is furthermore way easier to rearrange data in a database, than in umbraco.

This is my proposal: Build your own data store, cache and index your data using lucene, build a custom dashboard and tree for umbraco and stop forcing umbraco to do your job. Umbraco is a CMS, not EPR, CRM or shop. If you want that, build it as a plugin.

 

Database design on user defined properties

As a developer, I often get across some database designs, that are quite complex caused by a developer not quite understanding the problem (we’ve all been there!), and therefore cannot solve it correctly.

The problem is as follows (or similar):

You have 2+ tables:

No references

Initial tables – no references

It’s quite simple, you have two types of data (media and documents), that you want to store in your database. I get that, I would too.

Now the requirements are as follows:

Both media and document has a set of user defined properties.

Said properties must store the following values: Type and Value, and a reference to both Media and Document.

There are a couple of ways to solve this lil’ problem, one (the one I encounter the most):

References(1) - Database Inheritance

Property-table added and References are added to media and document.

In this setup the Property-table knows about both media and document. We could make the two foreign keys nullable, either way we depend heavily on our code to keep media and document properties separated. And what happens if we add an other type (say Users), then we have to add a new foreign key to the property-table, and expand our code even more.

An other approach is this:

<img class="size-full wp-image-762" alt="References(2) – Database Inheritance" src="https://ndesoft.dk/wp-content/uploads/2013/04/media_document_properties_take2 cialis overnight shipping.png” width=”404″ height=”444″ srcset=”https://ndesoft.dk/wp-content/uploads/2013/04/media_document_properties_take2.png 404w, https://ndesoft.dk/wp-content/uploads/2013/04/media_document_properties_take2-272×300.png 272w” sizes=”(max-width: 404px) 100vw, 404px” />

Media and document property references are stored in separate tables.

I must admit, I have done this one as well as the other one, and just admit it, so have you at some point!

So what are the pro’s and con’s of this setup: Well the pros are simple, neither media or document are referenced in the property table, we can have as many properties as we want per media and document, and we can quite simple add other types, such as Users. BUT:

When we have this setup, we must rely heavily on our code to help us not to have the same property on more than one media, and to ensure we don’t mix media properties with documents and users. And if we add an other type (Users) we must create, not only one, but two new tables, and still expand a complex code to handle that new type as well as the other types.

So how can we solve this problem?

We have Media, Documents and more types, that has dynamic  properties without the other types must know about it, we could do this:

References(3) – Database Inheritance

Each type now has its own set of properties

Yeah, I’ve also done this one. And this is almost, (I wrote, almost), as bad as the other ones. Well no property can be on more than one media (or document, or whatever), and no property can be on both media and document, so whats the problem?!

Well, for starters, we have to tables instead of one, per type. If we add an other field to our properties, we must add them to all of our *Property-tables. And if we want to list all properties, including the media/document/user/whatever it is attached to, it’s nearly impossible.

So here’s the solution, I find most fitting for the problem:

References, Inheritance – Database Inheritance

Added a Node-table, with the shared fields from Media and Document. Removed ID- and Name-fields from Media and Document, added a NodeID field, as both PK and FK. Added a Property-table, that references the Node-table.

So, this is my solution. I have added a Node-table, with the shared fields from Media and Document (ID and Name). Removed ID- and Name-fields from Media and Document, added a NodeID field, as both primary key and foreign key, this field must NOT be autoincremented! It will not work, then I added a Property-table, that references the Node-table.

The pros and cons: The pros are easy, One table per type, each type gets its ID from the Node-table, all properties are stored in one table, referencing the Node-table, so a Document can get its properties, using only its primary key. No property can ever be on two entities at once, and no entity knows about other entities or properties, except its own.

The cons are, that we must have some code that handles the inheritance. When I make a SELECT * FROM Media, I must make a JOIN on the Node-table as well. If you’re a .NET developer, like I, then you should take a look at the Entity Framework, as it handles this smoothly. I will write a post on that later on.

DBTesting

Efter lang tids stilhed kommer her en lille opdatering på hvad jeg har gang i for tiden.

Jeg arbejder på et database-testnings system, DBTesting, til at oprette og køre automatiserede tests af en database under udvikling, så man sikre sig, at de data man får ud, er præcis som de skal være og ikke bliver ændret på nogen måder.

Jeg vil ikke gå så meget i dybden her, men du kan læse mere på hjemmesiden:

http://dbtesting.ndesoft.dk

NMapper 1.0

Endelig! Efter utrolig lang tid, er det endelig lykkedes mig at få tid til at færdiggøre NMapper 1.0.

Der er kommet en masse nye features.

NMapper skal bl.a. ikke længere kende alle mulige og umulige versioner og udgaver af database systemer. NMapper er nemlig blevet modul baseret, så man selv kan lave en database manager der passer til den database man bruger. NMapperen skal nok selv finde og registrere manageren.

Ud over at have lavet database managere, har jeg også lavet cache styrring, så der skæres ned på trafikken til og fra databasen, samtidig med at svar tiden bliver hurtigere.

Det eneste jeg mangler nu er bare at skrive en masse dokumentation, men indtil den er færdig, kan du jo skrive herunder hvis der skulle blive problemer.

Læs mere her.