The Data Paradigm

Apps generate data. In some way the app will generate some data. It can be saving user preferences, sending info through some API, telemetry of Operation System (OS) and in the traditional way: saving relational data in a database.

Earlies in 2000’s was normal a software developer thinking about a software keeping in mind what would be the database and the data model. Even using Object Orientated (OO) languages, a lot of professional had Entity Relationship Diagram (ERD or ER) to design the software.

Going back in time a little, we can see basically the same approach with the Delphi 7 (or VB6) developers. In the 90’s developers had the dBASE file system (DBF), where each file is a group of information, or a table if you prefer.  Looking over this architecture, we have something like this:

Usually in this approach we had two disadvantages: 1 – that files are not able to scale to big sizes (and bigger the file became, more extra routines to redo the index was necessary); 2 – the data (and the software) works local only.

And to solve these problems, the dev guys adopted database management system (DBMS), where it is possible have a centralized database (in this case, application can keep local or even running in another server):

If you pay attention you will realize that we have a unique source of information. This way we can have multi-users, even multi-software, using the same data source. And now a days it is large adopted for small/medium softwares.

Another good point of this approach is ACID (atomicity, consistency, isolation, and durability).

Now we can rise some interesting concepts:

  • Data: It is part of the information. In relational database we create a data structure that relate among themselves to build information.
  • Information: We can aggregate data to build info. This way we can process data in a several ways to provide info.
  • Knowledge: It is a state where info has meaning. A good information can able user to make strategic decisions and operations (this user can be even an Artificial Intelligence (AI)).
  • Source of truth: It is a place where I can find data with a defined and desirable structure.

But if you have multiple read/write access to the database (for users and softwares) it will increase. And different softwares will evolve to have its own necessities (and “data needs”). How to scale that?

An easy way to deploy and scale by adding commodity hardware. Besides, DBMS usually have ways to separate different data context (like schemas,  tablespaces, whatever).

But until when will be manageable the costs of hardware increase and the internal organizations of the database?

Scale at this point can not be just focused in the database, because now we are able to realize data is just a state of the information I’ll need. So it can be write in different ways to serve my desires.

Lets try architect a solution thinking in software platform solution:

Clients, in this context, can be different softwares accessing a single service point. This service will process the input and persist in separated data repositories, each one with its own data structure to serve different purposes.

Now we achieve some important points of software development, such as: single responsibility, isolation, encapsulation and so on.

I can scale exactly the piece that I need, for instance the memory of the full-text server.

But it came with side affects: We lost ACID. Keep the sync among the data will be a headache.

To envolve this we can adopt some patterns, such as pipes and filters, for example:

Now we are able to have a unique data provider, which will use high CPU levels of processing to retrieve and transform data in the info requested.

How we are constantly transforming and processing the data, those tasks will consume a lot of CPU and time. Strategies based in cache, commands and queries should be considered.

Just to enlighten and have good examples of those considerations, for cache we can have Memcached or Redis, and to commands and queries we can adopt CQRS pattern.

Before evolving and talking more about the considerations, lets compare all of this with our human life: We constantly receiving information, events and expressing answers, answering requests and giving opinions in different ways.

We can read the news or listem it form a radio. We can use it talking with somebody or writing the same info in a formal way to a paper.

Human has a “cache memory”, like remember a phone number and dial it immediately. Long term memory, like friends name. Other repositories, like a notebook with notes from graduation class. Evolved systems will adopting similar practices.

And again: If you pay attention, database is not so important, is just one piece to store data in a structure that you will need for a determined purpose. That seems crazy, but makes sense when you think that you have a date that will be transformed and stored in different ways to answer different needs.

For instance, a list of desirable products of an ecommerce: You can cache it to answer the web portal efficiently. You will store it in a relational database to long term. You will “decompose” the info in a analytic database.  And finally index it in a full-text mechanism.

That became evident again when you think about microservices, where each one will have you own data bases/data mechanisms.

There is no definitive solution because each system has its own design, and evolve in different ways. But certainly there are common problems to solve, and some practices and patterns can help to solve it. Such as transactions.

We would be able to combine good database technologies with good queue technologies:

The database of the service will handle with ACID and Transactions operations. Some collector from database (or flow to replicate the data, like Oracle GoldenGate) can delivery the info for a queue (Kafka, RabbitMQ for instance) and it handle with the data stream. Full-text search index can be an Elastic Serach, MongoDb, RavenDb or other NoSQL database.

Sure, it can go on and on. But for now, I hope all of this opened you mind far beyond from the monolithic traditional 3-layer architecture software base (Client+Service+Databse… all centralized). So, be prepared for huge companies with amazing architectures to accommodate big systems, or to scale up you software when it needs.

I hope you enjoyed it.

Spaki.

With more than 15 years of experience developing softwares and technologies, talking about startups, trends and innovation, today my work is focused to be CTO, Software Architect, Technical Speaker, Technical Consultant and Entrepreneur.

From Brazil, currently lives in Portugal working at https://www.farfetch.com as Software Architect, besides to keep projects in Brazil, like http://www.almocando.com.br/

Share

2 thoughts on “The Data Paradigm”

Leave a Reply

Your email address will not be published. Required fields are marked *