Working with clients in data warehouse modernization project – Part 1
In this series of posts, we will be explaining how the journey of data warehouse modernization projects will be. The first step that we do when a company approaches us on a data warehouse migration project is compiling the list of problems they face currently related to the data and data management systems. The following are some of the problems that we see across clients who want to modernize their data warehouse.
Using the same database for transaction and reporting
Enterprises will design and develop databases that support various types of transactions that applications want to log. The data in the database used by applications for logging transactions will be heavily normalized for data integrity, reduce redundant data, and handle other non-functional requirements like security, performance, audit, backup, recovery, compacting, partitioning, etc.
When the data starts to grow exponentially enterprises start to have a problem in extracting the information from the database. The data must be structured differently for reporting needs in the database. When companies start to use the transaction tables for generating reports they start to face issues like timeout, slow response, and others.
Instead of starting with data warehouse migration may be the users can first design a new database for reporting needs and then start migrating the data from the transaction database to reporting database based on the desired frequency. ETL tools like Microsoft SQL Server Integration Services, Talend, Databricks, and others can be used to migrate the data from the transaction database to reporting database. When copying the data from the transaction database to reporting database the ETL tool must support various types of transformations that help us structure the data as per reporting needs.
This step may sound futile if we want to migrate the data warehouse directly to the cloud but this is a very useful step. This is a draft version of the modern data warehouse that we will be replicating in the modern data warehouse systems like Google BigQuery, Amazon Redshift, Snowflake, Hadoop, and others. The other benefit of building a data warehouse on-premise before we modernize in the cloud is that we don’t need to use a transaction database to pull the data. If we pull the data from the transaction database it will degrade the performance of the database and affects various applications. Instead, we get the data once a day during non-peak hours to an on-premise data warehouse and then migrate to the cloud from there.
If the amount of data is growing exponentially then the on-premise data warehouse will be very expensive to host and maintain so at this point we will be forced to skip the step of creating an on-premise data warehouse. Some of the clients we worked with already had huge data and hence setting up an on-premise data warehouse and then using that to migrate the data to the cloud was double work for them and it was also expensive.
Need for modern reporting solutions
Most of the reports used both internally and externally in companies are pre-defined and pre-planned reports. The users know what information the report has and they know how updated the data is on the report. Typically the data in the report is updated till yesterday.
The internal users now need Adhoc reports that they want to generate on-demand and they want data in more real-time. The users also want alerts about some events and transaction as soon as it happens because they want to be proactive in finding the smoke rather than fighting the fire.
Users want more interactive reports beyond drill-down or pivot reports. They want reports that tell the story and communicate the big picture. Legacy reporting platforms do not help them to work effectively with the data.
Upgrade cost requires significant investment
The other problem that we have seen faced by companies who want to modernize their data warehouse is the significant upfront investment on hardware purchases and software licenses that companies need to make. In such cases, the pay-as-you-go model offered by Google BigQuery is an excellent choice. Once we learn how to develop a cost-effective system in Google BigQuery then life becomes easy and we are on track to work on recovering the return on investment for this migration project. Companies now realize that hardware investments depreciate over the years and hence want to be proactive in not investing huge amounts in hardware for upgrades. The pace at which hardware innovation happens definitely leaves behind the company in the laggard state within 2 years if the company does not upgrade its hardware. On cloud upgrading, the hardware can be done instantly with much ease using the tools provided by the vendor.
Solow paradox setting in due to legacy systems
The Solow paradox is an economic concept where productivity gains start to decrease despite the rapid innovation in the system. This happens in software too. The user’s productivity in getting the job done increases consistently and rapidly at the launch of a new application. Once the users have learned the software in and out productivity will hit a plateau. At this point, there are no more returns on the investment coming from the software. Another issue that crops up is that the legacy systems start to have bugs and this delays the delivery of new features and reduces the pace at which we add value. This brings down the productivity of all the stakeholders and pushes the companies to rewrite the entire system in technology and platform that delivers the new set of values and measures.
The next step we do is to find the solution for the problems against our checklist and knowledge base do some technical evaluation of the proposed solution and then proceed to document the processes.
I hope this summary is useful to evaluate the data and data management systems used in your companies. I explained the how part of the data warehouse migration. If you are interested in knowing why you should undertake this mission then here is a wonderful article from one of the companies that do a fantastic job in the data and analytics space globally.
If you need any help you can reach out to me on my social media channels or the contact us page to schedule some consultation time to discuss your needs and explore the ways to make things better in your company.
H.Thirukkumaran
Founder & CEO
H.Thirukkumaran has over 20 years of experience in the IT industry. He worked in US for over 13 years for leading companies in various sectors like retail and ecommerce, investment banking, stock market, automobile and real estate He is the author of the book Learning Google BigQuery which explains how to build big data systems using Google BigQuery. He holds a masters in blockchain from Zigurat Innovation and Technology Business School from Barcelona Spain. He is also the India chapter lead for the Global Blockchain Initiative a non-profit from Germany that provides free education on blockchain. He currently lives in Chennai India.