Businesses often have data in several systems. For example, they may have customer data in a CRM system like Salesforce, eCommerce data in a cloud-based platform like BigCommerce, and ERP data in an enterprise system like Microsoft Dynamics.
To create a “Full Picture of the Business”, one must combine data from each of these sources into a single Data Warehouse that is optimized for reporting and analytics.
First, create a list of the data sources that you’re hoping to pull into the Data Warehouse. For each data source, answer the below questions.
1. What degree of “freshness” is required for this data in the warehouse?
This is important in selecting the right method of integration. For example, if “freshness” is not important for a particular data set, it might make sense to pick the cheapest method of integration, such as a nightly import process.
2. What methods of integration are available? What is the effort/costs required for each option?
- API —>What response format does the API deploy (JSON, XML, SOAP, etc.)? Is the API well documented? How available is the API? What response times can be expected? Does the API impose limits on the number and frequency of calls?
- Webhooks → can we get notified about a data change to enable real-time sync?
- ETL → Can we leverage a service like Heroku Connect, Xplenty, or MuleSoft? Can we access the data via DB connection? What methods of export does the platform make available?
3. What data should (or should not) be carried into the data warehouse?
For each data source, determine which tables and which columns within those selected tables should be transferred into the data warehouse.
4. How should the data be mapped and/or transformed to accommodate a standardized schema?
In order to answer this question, you first need to create a standardized schema, which is essentially a list of tables, the columns for each table, and the data types for each column. Creating a standard schema for the warehouse is important to enable reporting. Separate data sources often share synonymous objects/tables, so one should try to combine these into a standardized set of tables and columns to enable reporting on the “full picture” of the business.
For synonymous objects/tables, you may need to create a global identifier to link associated data spread across disparate systems. For example, your “Contacts” table in your CRM and your “Customers” table in your eCommerce database could have a “Global ID” field to link shared records together. Alternatively, the data warehouse could store the ID from each data source. For example, the “Customers” table in the data warehouse could have a “CRM ID” and “eCommerce ID” to create this link.
Once a standardized schema has been designed, we can assess how the data in a particular data source needs to be mapped and transformed to flow into this schema. For example, perhaps the column “Amount Charged” in our eCommerce Orders table maps to `Amount` in our standardized schema, but also needs to be transformed from a decimal value (e.g. 150.00) to an integer value representing cents instead of dollars (e.g. 15000).
5. How will we sync the data on an ongoing basis (i.e. what method of integration should we select)? At what frequency?
Based on the “freshness” requirements, the methods of integrations available and their corresponding effort/costs required, and the data transformation needed, we can select the integration method that will be best for the business’ needs.
Now that you have a good plan for getting each data source into a data warehouse, it’s time to execute. Leverage your internal team or hire an outside partner like LaunchPad Lab to build out the Data Warehouse and to build the integration for each data source.
Once the Data Warehouse is online, hook up your preferred BI tool and start building reports and analytics to move your business forward.