This post is part 2 of a 2 part series. In part 1, we covered the the ways in which you can blend data from two fact tables in Tableau and the pros and cons of each approach. I discussed what Tableau does internally. We also detailed what fact data are.
In my previous post, we learned what fact data are, how we can work with them in Tableau, and the issues with blending fact data in Tableau. Visualization tools while amazing at what they do, are better when paired with a Data Prep solution like Vero. Here I will show you how Vero can help you quickly shape your data, efficiently blend it, and finally produce a reporting table that Tableau would love.
One of the great things about using Vero for Data Prep, is that you can use the familiar paradigms of Measures and Dimensions and you don't have to be an expert coder. We like to think of Vero as the Tableau for Data Prep.
Please checkout the Vero Designer Quick Start guide if you are new to Vero.
Choose Tables for Analysis
Just like in Tableau, we will connect to our database and choose the requisite tables for analysis. The main difference in Vero is that we will choose both fact tables and import their schemas into a Vero project. The one thing we need to do here is let Vero know which tables are facts and which are dimensions. You can do it on the import screen of the connection wizard as shown below:
Click finish and the chosen tables will be imported into your current project.
Automatic In Database Blending
When Vero is aware of Fact tables, it will produce automated in database blending across tables as needed. Currently, in database blending will create full joins between each intermediate stage, but there is a way to control this which you will see later.
To start, create a new report and drag the ProductName column from the Products table and the rev column from the order_yr_prd_agg table and drop them in the Selections dropzone. By default, Vero creates everything as a Dimension. You can right click on a Dimension and choose Aggregate -> Sum to convert it to a Measure. Having done that you may have noticed in the Plan area an error indicator and a disjointed query plan (disconnected tables). This just means Vero doesn't have enough information to join these tables yet. You can create a join between them by dragging one and dropping it on the other. Here we will drag Products and drop it on order_yr_prd_agg. Our screen should like the following:
Next, drag the num_complaints column from the product_complaints table into the Selections dropzone. Before you proceed, convert num_complaints to a measure as shown below:
The action of adding a measure from a second fact table will cause Vero to instantly generate an in database blending operation. In traditional BI vernacular, this is called multi-pass SQL. Your screen should like the following:
We are almost there, but as you may have noticed there is a yellow warning icon on the second step. When we hover over the second step, we will see a message telling us that the ProductName dimension was suppressed. Click on the block (the step with the warning indicator) to see the problem in detail:
Here we can see that there is no join path between the ProductName dimension and the num_complaints measure. There will be many cases when the granularity of the required fact tables are different and your report has to accommodate measures with different dimensional levels. You could ignore this message if the num_complaints measure could not be rolled up to ProductName. In Vero, we call this automatic leveling. Here a join path does in fact exist so we will create it by dragging Products onto product_complaints.
Note: when creating joins between tables the omni editor below will display the join definition being edited. Click "Guess Join" to let Vero automatically determine the join definition. It has about at 60% accuracy so double check it, especially the cardinality.
Below is our second fact table (product_complaints) joined to the Products dimension:
With our second fact table successfully joined to its required dimension tables, we should have a clean in database blending flow view as follows:
The SQL generated by Vero is shown below. Notice the final pass query (Result Block SQL), uses a full join between each step. This means we will get back records where a matching ProductName doesn't exists in both fact tables.
What if you wanted fine grain control over each step and the joins between them? Vero has an answer for you using Temp Table Blocks.
Using Temporary Table Blocks for Fine Grained Control
Vero has a feature called temp table blocks and they do exactly what it sounds like: Temp Table Blocks creates a temporary table in your database based on the dimensions, measures, and filters applied to the block. They exist so you can apply transformations to your data in stages and are cleaned up at the end of a query execution.
Now back to our scenario. Above we had an automatic in database blend where each step was joined using a full join. Now we will use Temp Table blocks to create an inner join between each step. First, add two temp table blocks. In the first block, add ProductName dimension and Rev measure. In the second block, add ProductName dimension and num_complaints measure. Our screen looks something like the following:
Once each block is setup, you can combine them in the Result Block as shown below:
We use the Selections drop down menu to Select Block Items > Choose the Block named Rev > and Select All Items. Repeat the same steps for the Complaints block and we end up with the data flow shown below:
When you blend data using Vero's Temp Table Blocks, you can control how the blocks are joined together. You can set both the join key and the join type. To change the join type to inner join click on the join type indicators in the data flow and configure it as shown below:
In this example we are blending two Temp Table Blocks into an empty Result Block. This doesn't have to be the case. I did it this way to illustrate the flexibility of Vero. You could easily have put the Rev measure in the Result Block and just added one Temp Table block. In any case, once you configure both blocks with an inner join type our data flow should look as follows:
With our data flow successfully created, we can examine the final pass SQL to verify that an inner join is being used. Here is the SQL of the last pass of the above query:
You can now execute the report and preview the results directly in Vero. However, our story doesn't end here. We need to get this data in Tableau!
Exporting a Vero Report as a Table
To get the best experience in your visualization tools, export this Vero reports a materialized table. Vero's latest update includes many enhancements to make this a robust and seamless experience.
In the latest version of Vero we support exporting a report to a table in the same database or an external database. Many of our customers are using this feature to blend and move data from MySQL, Oracle, or Postgres into RedShift for analytics. Click on Add Export configure it as shown below:
If you're working with big data, you can set the report row limit to something like 10000 and check the Ignore Report Limit checkbox. This will allow you to sample the report in Vero but still export the full data into a target table.
In the future, Vero will support more exports including an export to a Tableau data extract, export as pdf/csv/excel, and even export as an email attachment.
With the export saved, our Export Browser looks like the following:
You can activate and deactivate exports by checking them on and off. Finally, let's run the report. It will create the export and show results in Vero as well:
Connecting Tableau to the newly created table from vero
We are now ready to speedup the Visualization created in part 1 of this series. Let's start by adding a new data source but this time we will pick the table we just exported: blend_complaints_rev.
At this point we can continue our workflow in Tableau like normal. I created the same layout as before but this time we use the exported table:
The wonderful part of using Vero is that we were able to using the paradigm of Dimensions and Measures, we didn't have to write any code, and we didn't have to figure out how to move data between databases. In a few minutes, we were able to soup up our Tableau visualization.
If you made it this far, you should consider starting a free trial of the Vero Designer.