In this series of posts we will be explaining about how the journey of data warehouse modernization projects will be. The first step that we do when a company approaches us on data warehouse migration project is compiling the list of problems they face currently related to the data and data management systems. The following are some of the problems that we see across clients who want to modernize their data warehouse.

Using the same database for transaction and reporting

Enterprises will design and develop databases that support various types of transactions that applications wants to log. The data in the database used by applications for logging transactions will be heavily normalized for data integrity, reduce redundant data, and handle other non-functional requirements like security, performance, audit, backup, recovery, compacting, partitioning, etc.

When the data starts to grow exponentially enterprises start to have a problem in extracting the information from the database. The data must be structured differently for reporting needs in the database. When companies start to use the transaction tables for generating report they start to face issues like timeout, slow response and others.

Instead of starting with data warehouse migration may be the users can first design a new database for reporting needs and then start migrating the data from transaction database to reporting database based on the desired frequency. ETL tools like Microsoft SQL Server Integration Services, Talend, Databricks and others can be used to migrate the data from transaction database to reporting database. When copying the data from transaction database to reporting database the ETL tool must support various types of transformations that help us structure the data as per reporting needs.

This step may sound futile if we want to migrate the data warehouse directly to cloud but this is a very useful step. This is a draft version of the modern data warehouse that we will be replicating in the modern data warehouse systems like Google BigQuery, Amazon RedShift, Snowflake, Hadoop and others. The other benefit of building a data warehouse on-premise before we modernize in cloud is that we don’t need to use transaction database to pull the data. If we pull the data from transaction database it will degrade the performance of database and it affects various applications. Instead we get the data once a day during non-peak hours to on-premise data warehouse and then migrate to cloud from there.

If the amount of data is growing exponentially then the on-premise data warehouse will be very expensive to host and maintain so at this point we will be forced to skip the step of creating on-premise data warehouse. Some of the clients we worked with already had huge data and hence setting up on-premise data warehouse and then using that to migrate the data to cloud was double work for them and it was also expensive.

Need for modern reporting solutions

Most of the reports used both internally and externally in companies are pre-defined and pre-planned reports. The users know what information the report has and they now how updated the data is on the report. Typically the data in the report is updated till yesterday.

The internal users now need adhoc reports that they want to generate on-demand and they want data more real-time. The users also want alerts about some events and transaction as soon as it happens because they want to be proactive in finding the smoke rather than fighting the fire.

Users want more interactive reports beyond drill-down or pivot reports. They want reports that tell the story and communicate the big picture. Legacy reporting platforms do not help them to work effectively with the data.

Upgrade cost requires significant investment

The other problem that we have seen faced by companies who want to modernize their data warehouse is the significant upfront investment on hardware purchase and software licenses that companies need to make. In such cases the pay-as-you-go model offered by Google BigQuery is an excellent choice. Once we learn how to develop a cost-effective system in Google BigQuery then life becomes easy and we are on track to work on recovering the return on investment for this migration project. Companies now realize that hardware investments depreciate over the years and hence want to be proactive in not investing huge amount in hardware for upgrades. The pace at which hardware innovation happens definitely leaves behind the company in laggard state within 2 years if the company does not upgrade its hardware. On cloud upgrading the hardware can be done instantly with much ease using the tools provided by the vendor.

Solow paradox setting in due to legacy systems

The Solow paradox is an economic concept where productivity gains start to decrease despite the rapid innovation in the system. This happens in software too. The user’s productivity in getting the job done increase consistently and rapidly at the launch of a new application. Once the users have learnt the software in and out productivity will hit a plateau. At this point there are no more returns on the investment coming from the software. Another issue that crops up is that the legacy systems start to have bugs and this delays the delivery of new features and reduces the pace at which we add value. This brings down the productivity of all the stakeholders and pushes the companies to rewrite the entire system in a technology and platform that delivers the new set of values and measures.

The next step we do is to find the solution for the problems against our checklist and knowledge base and do some technical evaluation of the proposed solution and then proceed to document the processes.

I hope this summary is useful to evaluate the data and data management systems used in your companies. I explained the how part of the data warehouse migration. If you are interested in knowing why you should undertake this mission then here is a wonderful article from one of the companies that does fantastic job in data and analytics space globally.

If you need any help you can reach out to me on my social media channels or the contact us page to schedule some consultation time to discuss your needs and explore the ways to make things better in your company.

H.Thirukkumaran is the Founder and CEO of Knowillence Pvt Ltd. He is an expert in emerging technologies like cloud, big data and analytics, blockchain and artificial intelligence. He is the author of the book Learning Google BigQuery.