Tableau Prep vs Einstein Analytics – Data Cleansing
May 10, 2020 Data Cleaning in Tableau; Python Pandas Series.between Python program to find number of days between two given dates; Python Difference between two dates (in minutes) using datetime.timedelta method; Python datetime.timedelta function; Comparing dates in Python; Python Convert string to DateTime and vice-versa. Instructor When you work as a data analyst, one of the most difficult tasks you will be asked to do is to clean data that comes in from multiple sources. Fortunately, Tableau has some tools.
This is the 2nd part of the blog series, so make sure you check out part 1 here!
At the recent Tableau conference there was a lot of talk around the Salesforce product Einstein and how this is starting to be integrated into some areas within the Tableau product range. Salesforce and Tableau have allowed users to bring all of their data together by combining Einstein Analytics and Tableau to form Tableau CRM.
Tableau + Einstein = Tableau CRM
This blog is going to focus on comparing the similarities and differences between the data preparation aspects of both products – Tableau Prep and Einstein Analytics. The key areas that I will focus on are:
- Input – How do you connect to data?
- Investigation – Getting to know the data and if there are any problems
- Cleansing – Cleaning the data with filters, calculations and string manipulation
- Combining – Joining and unioning the data
- Output – How can you get the data after you’ve finished?
This part is all about Cleansing your data so that it ready to find some insights.
One of the most important aspects of a data preparation tool is what abilities you have to clean the data and fix any issues that there may be. There are many different techniques to clean data but I am going to be focusing on Filtering, Calculations, & String manipulations.
There are multiple ways of filtering within Tableau Prep. The most basic is the ‘Selection’ filter where the user can select which fields or rows to filter from the view. The Profile Pane is great for this, as all the user has to do is select the field or value that they want to filter and click ‘Keep Only’ or ‘Remove/Exclude’:
Other filtering options include using one of the following:
- Calculation: this must end in a Boolean and uses Tableau’s normally calculation syntax
- Wildcard: this allows you to filter a discrete field where the value doesn’t have to exactly match the full value. This is shown with these easy selection options:
- Null Values: choose to keep Null or Non-Null values from a field
Again you can filter your values in a couple of ways within Einstein Analytics, so I’m going to cover both of them.
Filtering in a Dataflow is easy as this is a built in as a transformation so all you need to do is press the filter icon and complete the setup within the tool:
The filter transformation gives you the option to use the standard filter expressions or SAQL. I find using the SAQL syntax to be a bit easier, but both have their benefits depending on the use case. Here’s an example using SAQL:
Filtering within a Recipe is also easy as there is a dedicated filter button. It has a different setup compared with the Dataflow as it is more visual and easier to understand as you aren’t relying on SAQL.
You can add multiple conditions and these appear as steps on the left pane within the recipe, making it easier to understand what filters have occurred. This is similar to the changes pane within Tableau Prep and is great for auditing/handing over the process to other people.
Now we have filtered our data, it’s time to create some new fields via calculations. Types of calculations can be focused on arithmetic, dates, strings and others. Both Tableau Prep and Einstein Analytics allow you to do a lot with calculations so let’s have a look at how they differ.
Just like filtering, Tableau Prep makes it really easy to create a calculation with just a click of a button:
Once pressed the calculation dialog box opens, where you can type fields and Tableau prompts to help you create the calculations. It also has the reference pane on the right side with a list of all of the calculations and brief explanation on each of them.
Power Tools For Tableau
You can also add some more complex calculations including LODs (Level of Detail) and Rank calculations. Tableau Prep has a visual editor when completing analytical calculations making them them super simple to create:
Tableau Data Cleaning Tools
All very easy and user friendly!
As normal let’s split out the two different options with Einstein Analytics to see the different options that are available.
Depending on the type of calculation that you need to create will depend on what tool you use.
There are a couple of options:
- computeExpression – these are derived fields generated using a SAQL expression which is based on one or more fields from the workflow. The computeExpression transformation performs calculations based on other fields in the same row. Here’s a simple example:
- computeRelative – these are similar to computeExpression transformations, however computeRelative performs calculations based on the same field in other rows. It has a similar setup to the computeExpression but this time you select the Partition and Order by values:
Now let’s turn our focus to the Recipes and see how these differ from the Dataflows. Within a recipe, instead of having the compute buttons, you need to click on the drop-down on the field you want to create a calculation on, then select ‘Formula’:
This then opens the formula dialog box at the bottom of the screen. From here you can type the formula from scratch, however there aren’t as many prompts so this can be a bit harder if you aren’t sure on the syntax or which fields you want to include.
Making calculations is fairly easy and similar in both Tableau Prep and Einstein, with a few differences around working and syntax. In my experience the computeExpression is similar to normal calculations in Tableau Prep, whilst computeRelative is similar to LOD’s.
Personally, I prefer the syntax and usability of Tableau Prep as I believe it gives you a lot of guidance on what calculations are available and provides an easier user experience especially if you are new to the tool.
The final section that I’ll cover in the Cleansing topic is how each tool deals with manipulating string fields. This is really important when cleaning data, from splitting First and Last Name or parsing particular words from a longer string, so let’s see how each of the tools deal with this!
Within Prep there are lots of different ways to manipulate strings. This can be completed within the calculated field by using the different string operators including splits, contains, and RegEx. But there are also lots of native features to help you along the way, and make the harder challenges that little bit easier.
Some of the native features include:
- Splits – Automatic predicts what the best way to split the string is. This is useful when splitting things like Names. Also, custom splits allow you to take more control by giving the separator and how many times you want to split the string.
- Clean Strings – There are many different options when cleaning strings including removing punctuation, changing the case, and trimming spaces:
There are lots of other options when working with strings, and these are covered within a great blog post on the Preppin Data website.
Again let’s see how Einstein handles string manipulation in both the Dataflow and Recipes.
There aren’t any dedicated string manipulations transformations within a Dataflow so we are reliant on the features within the SAQL expressions in a computeExpression transformation. If you are unfamiliar with these then I would recommend taking a look at the SAQL Reference guide on the Salesforce website. Here you’ll find a section for string operators, however we only have the ability to concatenate strings therefore we are quite limited.
Within a Recipe there is a bit more functionality to deal with different strings. In each of the string fields there are various different options including:
- Trim – this removes leading or trailing whitespace
- Substring – this allows you to extract a given number of characters from the string
- Split – here you can choose a separator, from a given list or custom delimiter, and split the field into 2 columns.
- Upper / Lowercase – Changes the case of the string.
As you can see both tools have the ability to manipulate strings in different ways, with Tableau Prep having a few more native features including RegEx functionality. I have only covered a few basic techniques for both so I would highly recommend taking a deeper look into both products.
That’s all for the second part of this blog series. Within the next part I will focus on how you can Combine and Output data in both of the tools! If you missed it take a look at part one which is focussed on Inputs & Investigation.
Hopefully you have a bit of an understanding about how Tableau Prep and Einstein compare and how they can come together in the future.
If you’re interested in learning more about Tableau and Salesforce then please get in touch via [email protected]!