Improving Database Query Performance

9
Improving Database Query Performance Product(s): All Version(s): All Last Modified Date: 19 Mar 2015 Article Note: This article is no longer actively maintained by Tableau. We continue to make it available because the information is still valuable, but some steps may vary due to product changes. At the heart of creating well-performing workbooks and visualizations is the basic principle that the visualization will never run faster than the underlying query. Therefore, to ensure a workbook is running as quickly as possible you need to ensure the query is running optimally. For the End User Below are some tips and pointers to help workbook authors understand if data access is a problem and some suggestions on what they can do to address it. Know what you are asking Often a problem with slow-running visualizations is that you have inadvertently created a query that returns a large number of records from the underlying table(s), when a smaller number of aggregated records would suffice. The time it takes the database management system (DBMS) to calculate the results, then stream the records back to Tableau can be significant. You can check this by looking in the lower-left corner of the Tableau Desktop workspace and looking at the number of marks. If this number is very large, you are potentially pulling a large amount of data from the database.

description

Tableau Query Performance

Transcript of Improving Database Query Performance

Page 1: Improving Database Query Performance

Improving Database Query PerformanceProduct(s): AllVersion(s): AllLast Modified Date: 19 Mar 2015

Article Note: This article is no longer actively maintained by Tableau. We continue to make it available because the information is still valuable, but some steps may vary due to product changes.

At the heart of creating well-performing workbooks and visualizations is the basic principle that the visualization will never run faster than the underlying query. Therefore, to ensure a workbook is running as quickly as possible you need to ensure the query is running optimally.

For the End User

Below are some tips and pointers to help workbook authors understand if data access is a problem and some suggestions on what they can do to address it.

Know what you are asking

Often a problem with slow-running visualizations is that you have inadvertently created a query that returns a large number of records from the underlying table(s), when a smaller number of aggregated records would suffice. The time it takes the database management system (DBMS) to calculate the results, then stream the records back to Tableau can be significant. You can check this by looking in the lower-left corner of the Tableau Desktop workspace and looking at the number of marks. If this number is very large, you are potentially pulling a large amount of data from the database.

Ensure you are not including any unnecessary dimensions in your visualization - this will affect the aggregations in the database and increase the size of the result set.

Use native drivers

Tableau products include the ability to connect to a wide variety of data sources. Many of these data sources are implemented as native connections which mean Tableau has implemented techniques, capabilities and optimizations specific to these data sources. Tableau engineering and testing activities for these connections ensure they are the most robust Tableau has to offer.

Page 2: Improving Database Query Performance

Tableau has additionally implemented the option to use the general-purpose ODBC standard for accessing data sources beyond the list of named options available when creating a new connection. As a publicly defined standard, many database vendors make ODBC drivers available for connecting to their databases. Tableau provides the option to use these ODBC drivers to connect to data.

There can be differences in how each database vendor interprets or implements capabilities of the ODBC standard. In some cases Tableau will recommend or require you to create a data extract to continue working with a particular driver. There will also be some ODBC drivers and databases that Tableau is unable to connect to.

If there is a native driver for the data source you are querying you should use this over the ODBC connections as it will generally provide better performance.

Test with another tool

A good way to determine if a slow workbook is being caused by a slow query is to test the same query in another tool, such as Microsoft Access or Microsoft Excel. To find the query being run, look in My Documents\My Tableau Repository\Logs and find a file titled log.txt. Open this file and scroll up from the bottom until you find a section like the following:

 

2011-08-04 13:46:16.161 (2198): DATA INTERPRETER: Executing primary query.2011-08-04 13:46:16.171 (2204): <QUERY protocol='05d09100 '>2011-08-04 13:46:16.171 (2204): SELECT [Superstore APAC].[Customer Segment] AS [none:Customer Segment:nk],2011-08-04 13:46:16.171 (2204): [Superstore APAC].[Product Category] AS [none:Product Category:nk],2011-08-04 13:46:16.171 (2204): [Superstore APAC].[Product Sub-Category] AS [none:Product Sub-Category:nk],2011-08-04 13:46:16.171 (2204): SUM([Superstore APAC].[Sales]) AS [sum:Sales:qk]2011-08-04 13:46:16.171 (2204): FROM [dbo].[Superstore APAC] [Superstore APAC]2011-08-04 13:46:16.171 (2204): GROUP BY [Superstore APAC].[Customer Segment],2011-08-04 13:46:16.171 (2204): [Superstore APAC].[Product Category],2011-08-04 13:46:16.171 (2204): [Superstore APAC].[Product Sub-Category]2011-08-04 13:46:16.171 (2204): </QUERY >2011-08-04 13:46:16.238 (2204): [Time] Running the command took 0.0659 sec.2011-08-04 13:46:16.238 (2204): [Time] Running the query took 0.0662 sec.2011-08-04 13:46:16.240 (2204): [Time] Getting the records took 0.0007 sec.2011-08-04 13:46:16.240 (2204): Building the tuples took 0.0001 sec.2011-08-04 13:46:16.240 (2198): [Count] Query returned 68 records (Q10).

 

The section between the begin and end query tags is the query that was passed to the database. You can copy this text and then use it from a tool like Access or Excel. If it takes a similar time to return as in Tableau, then it's likely the problem is with the query, not the tools.

Page 3: Improving Database Query Performance

Use extracts

If you are seeing poor query performance when using a live connection to data (i.e., against an Excel spreadsheet or a database server) one easy way to improve performance is to create a Tableau data extract (.tde).

Page 4: Improving Database Query Performance

Extracts allow you to read the full set of data pointed to by your data connection and store it into an optimized file structure specifically designed for the type of analytic queries that Tableau creates. These extract files can include performance-oriented features such as pre-aggregated data for hierarchies and pre-calculated calculated fields (reducing the amount of work required to render and display the visualization). 

Optimize query performance by assuming referential integrity

When you are working with multiple tables in a data source, and you have joined multiple tables, you may be able to improve query performance by selecting the option to Assume Referential Integrity from the Data menu. When this option is selected, Tableau will include a joined table in the query only if it is specifically referenced by fields in the view. Referential integrity exists when any value you specify from a column in one table is assured to exist as a value for a column in any joined table. For details, see Assuming Referential Integrity in the Tableau Desktop online help.

Using this setting is appropriate when you know that your data has referential integrity but your database is not enforcing or cannot enforce referential integrity. If you are able to configure

Page 5: Improving Database Query Performance

referential integrity in your database that is a better option than using this setting because it can improve performance both in the database and in Tableau. The Assume Referential Integrity option in Tableau can only affect performance on Tableau's end.

If you data does not have referential integrity and you turn this setting on, query results may not be reliable

For the DBA

If the above points are not sufficient to address your performance problems, it could be that the problems are at a deeper level than can be addressed by an end user. Tableau suggests that you engage your database administrator (DBA) and have them look at the following section for suggestions.

Seriously, know what you are asking

As pointed out earlier, knowing what you are asking the database to do is an important part of performance tuning. By running an audit or trace on the database, you can isolate the query that Tableau has passed to the query engine, and can check to see if it is what you expect. For example, does it have the expected GROUP BY and filter clauses; is it doing aggregations in the query as opposed to returning raw field values; etc.

As an example, in SQL Server start the Profiler tool to trace the queries running (filter by application name ="Tableau 6.1" or by your user name if the server is busy). This will allow you to see what the query was and how long it took to return.

Page 6: Improving Database Query Performance

Tune your indexes

Once you know the query being run, you can dump it into the execution plan estimator to see in more detail how the DBMS will process the query - this does not execute the query, but returns the estimated execution plan based on the statistics the server has collected.

Page 7: Improving Database Query Performance

Based on the information returned, you can determine whether additional indexes need to be created (i.e., the kind of query being asked by end users has changed and the current indexing model no longer reflects this accurately).

This is a deep topic but some basic principles are:

Make certain you have indexes on all columns that are part of table joins Make certain you have indexes on any column used in a filter Explicitly define primary keys Explicitly define foreign key relationships For large data sets, use table partitioning Define columns as NOT NULL where possible

Use statistics

Databases engines collect statistical information about indexes and column data stored in the database. These statistics are used by the query optimizer to choose the most efficient plan for retrieving or updating data. Good statistics allow the optimizer to accurately assess the cost of different query plans, and choose a high-quality plan. For example, a common misunderstanding is that if you have indexes, databases will use those indexes to retrieve records in your query. Not necessarily. If you create, let's say, an index to a column City and <90% of the values are ‘Vancouver', the DBMS will most likely opt for a table scan instead of using the index if it knows these stats.

Ensuring that database statistics are being collected and used will help the database engine generate better queries, resulting in faster query performance.

Page 8: Improving Database Query Performance

Optimize the data model

Finally, the data model being queried can have a significant impact on the performance of queries. Ensuring that the structure of the data is aligned to the kinds of analysis the end users will do is critical for good query performance. If you find you are needing to design excessive joins, it could be an indication that the data model is not suited to the task at hand.

An example is where it could be beneficial to create summary tables if most of your queries only need aggregated data - not base level details records.

Again, this is a big topic beyond the scope of this article, but DBMS vendors have many whitepapers that describe their recommended best practices for data warehouse and data mart design.