Data Prep In Tableau

Note: Data source owners and Tableau administrators can add synonyms for specific data field names and values for Ask Data. For information about using data roles for Ask Data, see Add Synonyms for Ask Data(Link opens in a new window) in the Tableau Desktop help.

Tableau Prep can analyze your data and recommend cleaning operations that you can apply automatically to quickly fix problems in your data fields or help to identify problems so you can fix them. This feature is available in all step types except Input, Output and Join step types. Open Tableau Prep Builder and click the Add connection button. In web authoring, from the Home page, click Create Flow or from the Explore page, click New Flow. Then click Connect to Data.

Use data roles to quickly identify whether the values in a field are valid or not. Tableau Prep delivers a standard set of data roles that you can select from or you can create your own using the unique field values in your data set.

When you assign a data role, Tableau Prep compares the standard values defined for the data role with the values in your field. Any values that don't match are marked with a red exclamation mark. You can filter your field to view only the valid or invalid values and take the appropriate actions to fix them. Once you've assigned a data role to your fields, you can use the Group Values option to group and match invalid values to valid ones based on spelling and pronunciation.

Note: Starting in version 2020.4.1, you can now create and edit flows in Tableau Server and Tableau Online. The content in this topic applies to all platforms, unless specifically noted. For more information about authoring flows on the web, see Tableau Prep on the Web.

Assign standard data roles to your data

Assign data roles provided by Tableau Prep to your field the same way you assign a data type. The data role identifies what your data values represent so Tableau Prep can automatically validate values and highlight ones that aren't valid for that role.

For example if you have field values for geographical data, you can assign a data role of City and Tableau Prep compares the values in the field to a set of known domain values to identify values that don't match.

Note: Each field is analyzed independently so a City value of 'Portland' in State 'Washington' in Country 'USA' might not be a valid city and state combination, but it won't be identified that way because it is a valid city name.

Tableau Prep Builder provides the following data roles:

  • Email

  • URL

  • Geographic roles (Based on current geographic data and is the same data used by Tableau Desktop)

    • Airport
    • Area code (U.S.)
    • CBSA/MSA
    • City
    • Congressional District (U.S.)
    • Country/Region
    • County
    • NUTS Europe
    • State/Province
    • Zip code/Postal code

Tip: In Tableau Prep Builder version 2019.1.4 and later and on the web, if you assign a geographic role to a field, you can also use that data role to match and group values with the standard value defined by your data role. For more information about grouping values using data roles, see Clean and Shape Data(Link opens in a new window).

To assign a data role to a field, do the following:

  1. In the Profile pane, Results pane or data grid, click the data type for the field.

  2. Select the data role for the field.

    Tableau Prep compares the field's data values to known domain values or patterns (for email or URL) for the data role you select and marks any values that don't match with a red exclamation point.

  3. Click the drop-down arrow for the field and from the Show Values section select an option to show all values or only values that are valid or not valid for the data role.

  4. Use the cleaning options on the More optionsmenu for the field to correct any values that aren't valid. For more information about how to clean your field values see About cleaning operations(Link opens in a new window).

Create custom data roles

Starting in Tableau Prep Builder version 2019.3.1 and on the web, you can create your own custom data roles using the field values in your data sets to create a standard set of values that you or others can then use to validate fields when cleaning data. Select the field that you want to use, apply any cleaning operations to it if needed, then, publish it to Tableau Server or Tableau Online to use it in your flow or share your data roles with others.

If creating custom data roles when editing flows on the web, you can publish the custom data role directly to the server you are signed into.


  • You can create custom data roles from single fields in your data set. Creating custom data roles from a combination of fields isn't supported.
  • You can create custom data roles only for fields assigned to a data type of String and Number (whole).
  • When you create a custom data role, Tableau Prep creates an output step in your flow that is specific to publishing the data role.
  • Publishing custom data roles to multiple sites in the same flow isn't supported. If you publish the flow, you must publish the custom data role to the same site or server where the flow is published.
  • Custom data roles are specific to the site, server and project where you publish them. All users with permissions to the location can use the custom data role, but must be signed into the site or server to select it or apply it. Custom data roles are assigned the default permission for the All Users group for new projects instead of None.
  • Custom data roles aren't version specific. When applying a custom data role, the most current version is applied.
  • Once published to Tableau Server or Tableau Online user with access to the site, server and project can view all data roles in that location.
    • Users with appropriate permissions can move, delete or edit permissions for the data roles.
    • The permissions you can set and actions you can take on a custom data role are similar to what you can do with a flow. For more information, see Manage a Flow(Link opens in a new window). For more information on setting permissions, see Permission capabilities(Link opens in a new window) in the Tableau Server help.
  • To edit a data role, you must make your changes in Tableau Prep Builder or in the flow on the web, then republish the data role using the same name to overwrite it. This process is similar to editing a published data source.

Create a custom data role

  1. In the Profile pane, data grid, or Results pane select the field you want to use to create a custom data role.

  2. Click More options for the field, and select Publish as Data Role.

  3. Select the server and project where you want to publish the data role.

  4. Click Run Flow to create the data role. After the publishing process completes successfully, you can view your data role in Tableau Server or Tableau Online. Processing the data role can take some time based on the load on your Tableau Server or Tableau Online site. If your data role isn't available right away, wait a few minutes, then try selecting it again.

Apply a custom data role

  1. In the Profile pane, Results pane or data grid, click the data type for the field where you want to apply the custom data role.

  2. Select Custom then select the data role that you want to apply to the field.

    Important: In Tableau Prep Builder, make sure you are signed into the site or server where the data role was published or you won't see this option.

    Tableau Prep compares the field's data values to known domain values for the data role you select and marks any values that don't match with a red exclamation point.

  3. Click the drop-down arrow for the field and from the Show Values section select an option to show all values or only values that are valid or not valid for the data role.

  4. Use the cleaning options on the More optionsmenu for the field to correct any values that aren't valid. For more information about how to clean your field values see About cleaning operations(Link opens in a new window).

View and manage custom data roles

You can view and manage your published custom data roles on Tableau Server and Tableau Online. You can view all custom data roles published to your site or server. Click More actions for a selected data role to move it to a different project, change permissions or delete it.

Group similar values by data role

Note: In Tableau Prep Builder version 2019.1.4 and 2019.2.1 this option was labeled Data Role Matches.

If you assign a geographic data role to a field you can use the values in the data role to group and match values in your data field based on spelling and pronunciation to standardize them. You can use either Spelling or Spelling + Pronunciation to group and match invalid values to valid ones.

These options uses the standard value defined by the data role. If the standard value isn't in your data set sample, Tableau Prep adds it automatically and marks the value as not in the original data set. For more information about assigning data roles to fields, see Assign standard data roles to your data.

To use data roles to group values, complete the following steps.

  1. In the Profile pane, Results pane or data grid, click the data type for the field.

  2. Select one of the following data roles for the field:

    • Airport
    • City
    • Country/Region
    • County
    • State/Province

    Starting in Tableau Prep Builder version 2019.3.2 and on the web, you can also select from your custom data roles.

    Standard data roles (version 2019.1.4 and later)Custom data roles (version 2019.3.2 and later)

    Tableau Prep compares the field's data values to known domain values for the data role you select and marks any values that don't match with a red exclamation point.

  3. Click More options, select Group Values (Group and Replace in previous versions), then select one of the following options:

    • Spelling: Matches invalid values to the closest valid values that differ by adding, removing, or substituting characters.
    • Pronunciation + Spelling: Matches invalid values to the most similar valid value based on spelling and pronunciation.

    You can also click on the Recommendationsicon on the field to apply the recommendation to group and replace the invalid values with valid ones. This option uses the Pronunciation + Spelling Group Values option.

    Tableau Prep compares the values by spelling or spelling and pronunciation and then groups similar values under the standardized value for the data role. If the standardized value isn't in your data set, the value is added and marked with a red dot.

Thanks for your feedback!

Preparing data for Tableau generally requires an ETL solution. The problem is that most ETL solutions are extremely expensive and difficult to use – or they lack the sophisticated data transformation capabilities that your use-case requires.

Xplenty is different. Xplenty is a cloud-native, enterprise-grade ETL platform that empowers anyone to quickly create sophisticated ETL pipelines between any source and destination. Even if you’re a data integration beginner, each Xplenty provides unlimited support from a dedicated integration specialist – so you’ll have an expert available for guidance whenever you need it.

In this guide, we’ll show you how to prepare data for Tableau using Xplenty in five quick steps. But first, we’ll start with a brief overview of data preparation for Tableau and why it's necessary.

Table of Contents

Overview of Preparing Data for Tableau

Tableau is an enterprise-grade business intelligence solution that offers a wide range of data analytics and visual presentation features. With its easy-to-use, point-and-click interface, Tableau empowers you to analyze and explore data via beautiful dashboards – then share the insights you find with decision-makers and the rest of your team.

Some of Tableau’s best features include:

  • Easy to use: Tableau features an intuitive, drag-and-drop, no-code interface.
  • Embedded dashboards: Users can publish their Tableau dashboards on the web and via mobile devices to offer the easiest access to live metrics.
  • Advanced analytics features: Tableau includes advanced analytics features like time-series metrics, predictive AI analytics, cohort analysis, segmentation analyses, and more.
  • “Explain Data' feature: The 'Explain Data' feature employs artificial intelligence and Bayesian methods to generate “statistically significant explanations” of interesting patterns in datasets.
  • Python, R, and MATLAB Integrations: Tableau allows developers to create interactive dashboards and visualizations in Python, R, and MATLAB.

While Tableau offers many advanced features, exposing your data to the platform requires data preparation through a data integration (ETL/ELT) process. You can prepare data for Tableau by developing an automated ETL pipeline with Xplenty.

After setting up your ETL pipeline (or “package”), Xplenty will run the package at preset intervals according to the custom schedule you define. When running the package, Xplenty will periodically (1) extract data from the source; (2) transform it in Xplenty’s robust transformation layer in a way that suits your data manipulation/cleaning/transformation requirements; and (3) load the information into the data warehouse. From there, Tableau can connect to the data warehouse, access the data, and analyze it to produce the visual metrics you need.

Ultimately, the data preparation actions you implement in Xplenty will reflect the analyses that you need to perform. For example, your data preparation for Tableau might need to:

  • Filter the data on certain columns
  • Pivot and aggregate the data based on different timeframes
  • Normalize formatting to support Tableau’s ability to read/analyze the information
  • Remove PHI/PII data for compliance purposes before loading it into a data warehouse.

Now let’s explore an example, where we use Xplenty to develop a complete data preparation pipeline for Tableau.

Integrate Your Data Today!

Try Xplenty free for 14 days. No credit card required.

Preparing Data for Tableau: A Step-by-Step Xplenty Guide

The following sections are a step-by-step guide to preparing data for Tableau using Xplenty.

1) Understand Your Requirements

The first step in preparing data for Tableau is to understand your requirements. For the purposes of this guide, we will use the following example scenario:

Your business sells a unique software product, and your website includes a sales funnel that offers a free trial of the software. To sign up for the trial, potential customers fill out a web form and schedule an appointment to speak with a representative.

You want to measure the success of this sales funnel by pulling data from Google Analytics and sending it to Tableau. This Google Analytics data includes a range of metrics on how website users interact with your website and web form sales funnel.

Decision-makers want you to aggregate this data by Day, Month, Week, and Year-to-Date. They want to investigate different customer actions within the web page funnel, and whether those actions result in the successful completion of product “Intro” and product “Demo” sales calls. By analyzing the data with Tableau, decision-makers can visualize and identify bottlenecks in the sales funnel and experiment with changes to make it more successful.

Now that we understand the goals and requirements of the data analysis, we can develop a rough idea for the data preparation components the Xplenty dataflow should implement. Generally speaking, we can divide the process into the following steps:

  1. Connect to the data source: Establish a connection to the data source (Google Analytics in this case).
  2. Extract information: Select the specific columns of data to extract, and extract the data into your ETL data engine.
  3. Filter transformations: Use “Filter” transformations tofilter the data on different columns to parse out the rows of data the analysis needs.
  4. Select transformations: Use “Select” transformations to “pivot” the table in a way that creates new columns of data based on different time-frames (like Day, Month, Week, and Year-to-Date).
  5. Aggregate and Sort transformations: Use “Aggregate” transformations to add up totals for the new Day, Month, Week, and Year-to-Date columns created in the previous step. Lastly, use a “Sort” transformation to order the data table by date.
  6. Load the data into the data warehouse: Load the transformed data into the data warehouse so Tableau can read it.

In the following sections, we’ll use Xplenty to develop a data transformation workflow that mirrors the above steps.

2) Create a Connection for the Data Source

Xplenty offers 200+ native connectors so you can connect with popular data sources including Google Analytics. It also offers a Universal REST API connector so you can connect to more diverse API endpoints. To connect to a data source with a native connector, log into Xplenty, and perform the following actions. First (1) select the “Connections” icon in the upper left to navigate to the library of connections. Next (2) select “New Connection”:

From here, select the data source that you want to use. In this case, we're using the “Google Analytics” connector that we set up in the previous step.

Complete the form that appears (next image). First (1) name the connection. Then (2) insert your user name, and (3) insert your password. If the source is a database, you will insert the Hostname data as well.

3) Create a New Data Pipeline (Create a Package)

Now you’re ready to create the data pipeline, which is called a “package” in Xplenty and it contains all of the ETL steps. First (1) select the “Packages” icon in the upper left of the dashboard to navigate to the transformation workflow tool. Then (2) select “New Package” in the upper right:

A package configuration form will appear (next image). Complete this form by (1) naming the package (we have named it “Google Analytics”). Then (2) indicate the type of package (workflow or dataflow). In this example, we are creating a “Dataflow,” which is a data pipeline with three types of components (Data source connections, Transformations, and Destinations). A “Workflow” package allows you to orchestrate a sequence of dataflow packages and set rules/dependencies for when they run. After selecting the package type (3) click the “Create Package” button in the lower right:

(i) Choose and Configure the Data Source

In this step, you will add the first component (the source) to the dataflow package. Xplenty allows you to extract data from multiple sources within the same dataflow, but for this example, we only use one. Open the package you just created. Click “Add Component”:

Select “Google Analytics”:

The new source component will appear. Select this component to open the setup wizard:

When setting up the source component, you need to give it a name. We named it “google_analytics_1.”

In the next step, “Choose input connection,” you will identify the data source (we used the Google Analytics connection that we set up above). After clicking “next,” you can set up “Source properties.

“Source properties” identifies the properties of the data you want to extract. In this case, we have selected a custom date range of July 9, 2019, to July 13, 2020:

The next step is “Select input fields.” Here you will select the Google Analytics data fields to extract. There are thousands of Google Analytics data fields. Use the Dimensions, Metrics, and Meta Data taps to search for the fields that your ETL process needs. Clicking the “+” icon next to any field to add it to the “selected fields” list.

You can preview the raw data you're extracting by clicking the “refresh” button in the lower right. Once complete, click “Save.”

4) Configure Your Data Pipeline

Xplenty has 15+ ready-made, no-code data transformation components, and 200+ ready-made transformation functions that you can apply to the data. We recommend preparing the data as a series of individual transformations organized into separate components. This will help you stay organized, and it will make performance tuning easier later.

We also recommend starting with one or more Filter components. This allows you to filter out the data rows you want column by column based on the specific values that you need. It will also make your data table smaller, easier to manage, and improve ETL performance. In the initial transformations, you may also want to use one of Xplenty’s encryption or hashing functions to secure any. PHI/PII data in accordance with your industry’s compliance standards.

If you’re familiar with the general logic of SQL, Xplenty’s available transformations (Select, Sort, Join, Limit, Cross Join, etc.) will probably make sense to you. If you’re not sure which transformations to choose or how to set them up, you can always window chat with your dedicated Xplenty integration specialist for immediate assistance.

(i) Create Two Filter Transformation Components

For the reasons stated above, the first two transformations will be “Filter” components. Add the first Filter component to the pipeline by hovering over the source component. Click the “+” that appears in the blue area below it the icon:

Choose the type of transformation from the window that appears. Choose “Filter”:

Next, you will see a form that lets you specify the details of the Filter component. First (1) Label the component (we labeled it “get_invitee_funnels”). Then (2) identify the column and the desired rows of the column by naming specific field values. In this case, we are filtering on the “ga_eventAction” column. We are filtering all the rows where the “ga_eventAction” column “text equals” four specific values (see image). When finished, click “Save.”

*In the above image, each of the filtered values represents a step customers take in the sales funnel while they schedule an appointment with a company representative to try the software. By filtering these rows, you’re giving Tableau the specific data it needs to identify any steps where customers are dropping out of the funnel. Decision-makers can use the Tableau analysis of the data to brainstorm solutions to any bottlenecks.

Next, we’ll add a second Filter component to the pipeline that filters the data more. Click the “+” below the Filter component we just created. Again, click “Filter” in the menu that appears:

In this second transformation, you’ll filter the data more, but on a different column. First (1) label the component (we labeled it “filter_demo_calls”). Next (2) choose the values that indicate the rows you will filter out from the column. We have selected the “Intro” and “Demo.”

Data Prep In Tableau Tutorial

Note that we have also selected the “regex” operator (see next image). The regex operator performs a close match on the terms “Intro” and “Demo” instead of an exact match. This allows for variables and typos.

*Note that the second filter refined the dataset further to produce an even smaller table of information. The resulting table focuses on specific data as it pertains to customer funnels that lead to product Introduction and Demo calls. This offers an easier-to-analyze dataset to Tableau and the entire process will be more performant.

(ii) Select Components: Pivot the Table and More

SQL developers like to “pivot tables” to derive and highlight specific information in a dataset. Pivoting the table means that we will create new columns of data from existing data. In this case, the new columns will show aggregated values for different time periods (Day, Month, Week, and Year-to-Date). By pivoting the table, we give Tableau the data it needs to create timeframe specific graphs and visuals. To do this, we will add a Select transformation component to the pipeline and use Xplenty’s Expression Editor to define the nature of the pivot.

Ultimately, we will add two “Select” components. The first Select component includes the pivot functions described above. The second Select component, unlike other table-level transformation components, is a field-level transformation component. It includes more nuanced transformations to manipulate the new column data that the pivots created.

First, we’ll create both components. (1) Click the “+” below the previous component in the pipeline, and choose the “Select” icon at the top to add the Select component. (1) Repeat this to add a second “Select” component.

Click on the first component to configure the pivots (next image). First (1) name the component. We named it “pivot.” Next (2) search for the column you want to select. Then (3) click the icon to write the pivot in the Xplenty Expression Editor. You’ll repeat this process for every new column you want to create. In this example component, there are five pivots, but we will only look at the one indicated by arrow “3.”

After clicking where Arrow 3 indicates above, you’ll open the Expression Editor. The Expression Editor allows you to search for cut-and-paste functions that will define the pivot. In this case, we have pivoted on the “ga_eventAction” column to create a new column for the row value “invitee_event_type_page.” We have named the new column “ga_totalEvents.”

Click ‘Save,’ and move on to the second Select component.

As for the second Select component, we have named it “Select_1.” There are seven transformation functions in this component. Without going into a technical description of each, the following images show you how we configured the second Select component to manipulate the new column data which was created by the Pivot component:

After clicking the Expression Editor icon (green arrow above), you can see the Expression Editor functions for the selected transformation. This particular transformation manipulates the “ga_date” column into a different format that allows Tableau to identify the “sequential week number” within a particular year. Now your data table will have a column that shows you which number week from the year you are viewing data for:

(iii) Aggregate and Sort Components

There are only two more transformation components left for the pipeline. First (1) we will create an “Aggregate” (or GROUP BY in SQL) component to add up values for the new columns created in the last section. Then (2) we will create a “Sort” (or SORT BY in SQL) component. We’ll add these components like we did the previous ones.

Both of these components are relatively simple to set up. In the next two screenshots, you’ll see how to configure them. Here’s how we configured the Aggregate (GROUP BY) component:

Here’s how we configured the SORT BY component:

Data Prep In Tableau For Dummies

We have now finished the process of configuring the dataflow components for all of the data preparation tasks. In the next section, we will add the “Destination” component that loads the prepared data into the data warehouse.

(iv) Connect to the Destination

In this step, we will connect to the destination data warehouse that Tableau reads. Do this by adding a new component to the pipeline, and select the data warehouse you intend to use. In this case, we’re sending the data to Google BigQuery:

Data Profiling In Tableau Prep

The next three images show the three steps in the wizard that configures the Big Query destination. In the first image, we named the component “Bigquery_destination” and selected our test_BQ destination connection:

In the next image, we define the name of the source table (which is the data table that results from the previous data pipeline):

Lastly, we map the input to appropriate target table columns. You can use the “find” feature to locate the appropriate fields:

With this final connection to the data warehouse established, you can run the package to load the data, and you can schedule the data to refresh automatically according to your minute-by-minute, daily, weekly, or monthly needs.

Integrate Your Data Today!

Try Xplenty free for 14 days. No credit card required.

Once the data has loaded into the data warehouse, Tableau will be able to analyze and produce interactive visualizations and graphs so decision-makers can understand the data.


This concludes the tutorial on how to prepare data for Tableau. After reading this guide, you should have a clearer picture of what it’s like to build a sophisticated ETL pipeline for Tableau with Xplenty.

Data Prep In Tableau Interview

As you have seen, the way you prepare data for Tableau depends entirely on the use-case – i.e., the nature of your data and the types of analyses you want Tableau to perform. Ultimately, a powerful yet easy-to-use ETL platform like Xplenty can help you build automated data pipelines like the one above in a matter of hours. This dataflow will then run on autopilot – according to your schedule – to give your team the metrics they need for more strategic business decisions.

What Is Tableau Prep

Remember, if you ever get stuck using Xplenty, your dedicated Xplenty integration specialist is always available to take the reins and help you set up the data pipelines you need.

Want to try Xplenty for yourself? Contact our team for a free Xplenty trial!