Easy way to get to #Tableau Server Help.

or: “@joemako had me at /currentadmin”.

So, I’m reading a 3-year old post on the Tableau forums because that’s what I like to do. and Joe points someone to Tableau Server online help with the following URL:

www.tableausoftware.com/currentadmin   

I had no idea. This beauty automatically lands you on the latest/greatest version of help. 

Joe, I owe you a bunch of keystrokes. Thanks.

IP-based Geo-location in Tableau (New! Now with more R!)

@highvizability wrote a great piece last year about how to geocode a user’s address using R. You can read the article here and download a great sample workbook:

http://www.dataplusscience.com/Geocoding%20in%20Tableau%20using%20R.html 

Let’s do the same thing but with someone’s IP address, shall we?

Tableau has already documented one method to approach this, but technology moves on. FYI, I wouldn’t use the following technique if I’m going to plot thousands of  marks / ip addresses on a map - it would probably take too long. That said…

First, I found another great blog by a gentleman who figured out how to make ip-based geocoding go in R. You can read article here. Everything I did is based on his work.

Essentially, I’ve dumbed-down what Andrew put together and modified his function so that it only returns the latitude and longitude of the ip address being encoded:

{
     if (1 == length(ip))
     {
          # a single IP address
          require(rjson)
          url <- paste(c(“http://freegeoip.net/json/”, ip), collapse=”)
          ret <- fromJSON(readLines(url, warn=FALSE))
          ret <- data.frame(t(unlist(ret)))
          latlon <- paste(ret$latitude,ret$longitude,sep=”,”)
          return(latlon)

     }
}

The function above requires the rjson package, which lives in CRAN.

The function must also be available to your RServ server. This is generally accomplished by finding the Rserv.config file for RServ and adding a reference to a file which holds the text for your function.

For example, I dropped the text above into a file called geo.txt, and then plugged that value into the config file like so:

image

After that, all the work happens in Tableau,

You’ll create an expression which calls the function and passes it the ip address (in this case, named [ip] in my dataset):

image

Then, create two additional expressions which parse out the latitude and longitude:

image

…and you’re done!

You can download samples from here:  http://1drv.ms/PUO69Q

Testing Tableau with SQL Server 2014 (in memory!)

As an ex-blue badge, I have a love for SQL Server that’s baked into my genes. I. Love. SQL. Server. Clear?

So, I was excited to load up SQL Server 2014 for the first time since it RTM’d and take it for a spin.

I thought it would fun to run a little test: namely, run the same dashboard against “on disk” SQL Server, a Tableau Extract, and In-Memory OLTP on SQL Server. Let’s see what comes back fastest, shall we? 

(What?! You say you don’t know what In-Memory OLTP is? Luddite! Read about it here).

The Dashboard and Data

Anyway, here is the blue dashboard I came up with for my ex-blue badge testing. No, it doesn’t follow dataviz best practices, but it’s blue and I like blue today:

image

The key thing I wanted to do was introduce several levels of grouping into multiple vizzes to create a meat-and-potatoes dashboard representative of what a lot of people do with Tableau. 

The dataset I used was pretty small - just under 4M rows in a star schema. Here’s the ERD:

image

Only basic indexing is in play with the on-disk database - I have clustered indexes created on each of the PK fields on the dimension tables. 

Using SQL Server 2014

Upgrading my SQL Server 2012 to 2014 was like buttah - very straight forward and pain-free. This made me happy. 

One must store in-memory tables in a new type of SQL Server filegroup used for this purpose, so I added one of these suckers to my database and then used the Table Memory Optimization Advisor to make in-memory copies of my original tables. The TMOA is pretty much like a wizard that tells you if you’re doing anything stupid, gives you some basic advice and then creates a copy of the table you’re working with (with or without data) inside the Memory Optimized Data filegroup. 

I no longer consider myself a SQL Server Pro - more like a week-end warrior…and the fact that I could turn this part of the project around so quickly should be seen as a feather in Microsoft’s cap. 

The Testing

I performed my testing as follows, timing each dashboard execution: 

  • Cycled SQL Server to clear cache (or for Tableau extract testing, I restarted the OS - this is the only way to know the extract file isn’t already in memory from a previous run). I called this an “un-cached run”.
  • Ran the blue dashboard and closed Tableau
  • Re-opened Tableau and ran the dashboard again (a “cached” run)
  • Cycled SQL Server and/or restarted the OS once more and ran my blue dashboard, this time using Tableau Performance Recorder (another “un-cached run”)

…and things don’t look so good for In-Memory OTLP, do they?

Let’s look at the Performance Recorder output for Tableau extract based runs:

image

Geocoding took the most time, followed by the 920 millisecond query execution which helps to support this process. 

Here’s the recorder when I ran the dashboard against “on disk” SQL Server: 

image

The query which supports the running sum area chart at the top of my blue dashboard takes the longest here, followed by the same Geocoding task & geocode support queries.

And now, the surprisingly slow In-Memory OLTP source: 

image

It took 29 seconds to run the query which powers the box and whisker plot viz. Insane! The others weren’t particularly light on their feet either, but 29 seconds is just ugly.

Here’s the SQL for the query in question:

SELECT
   DATEPART(month,[DimDate].[Full_SQL_Date]) AS [mn:Full_SQL_Date:ok],
  [DimClient].[Client_Name] AS [none:Client_Name:nk],
   DATEPART(year,[DimDate].[Full_SQL_Date]) AS [yr:Full_SQL_Date:ok],
   SUM(CAST(CAST([FactDailyResponse].[Impressions] as bigint) as BIGINT)) AS [sum:Impressions:ok]
FROM
   [dbo].[FactDailyResponse] [FactDailyResponse]
   INNER JOIN [dbo].[DimClient] [DimClient] ON ([FactDailyResponse].[Client_Key] = [DimClient].[Client_Key])
   INNER JOIN [dbo].[DimDate] [DimDate] ON ([FactDailyResponse].[Date_Key] = [DimDate].[Date_Key])
GROUP BY
   DATEPART(month,[DimDate].[Full_SQL_Date]),
  [DimClient].[Client_Name],
   DATEPART(year,[DimDate].[Full_SQL_Date])

 I ran this sucker in Query Analyzer and got more or less the same result, then turned checked out the execution plan and saw several alerts that statistics on my table indexes were out of date. 

…and that was the problem. 

This help topic says that statistics are created automatically but not updated automatically on memory-optimized tables. When I updated the statistics manually on the dimDate table, query execution immediately dropped to ~3 seconds in Query Analyzer. 

I updated statistics on all of my tables and then tried again:

That’s better - still slower than on-disk SQL. Surprised.

What I’m seeing thus far could very well have to do with the type of index I’m using on my tables: (a NONCLUSTERED HASH on the PK field). Books Online indicates that this is what I should use for a field acting as my primary key, but I might try a NONCLUSTERED index instead. Might also try to add additional indexes on some of the fields that I get used a bunch, like my date field. 
Here’s the performance recorder running my dashboard against in-memory with updated statistics on my indexes:
image
Take-aways thus far:
  • Don’t forget to update your statistics!
  • Since everything is “cached” in an in-memory OLTP scenario, you’ll get about the same perf each time you run a query
  • Using In-Memory OLTP isn’t “fire and forget” like I’ve gotten used to with SQL Server over the years - looks like I’m going to need to spend some time adding & tuning indexes for best effect. This will make free-form analysis clumsy as each user may focus on different fields (which may or may not be indexed)
  • Extracts still offering better performance since all tables are “pre-joined” inside it.

More to come!

(Edit #1: It’s about 2 hours later)

I’ve started adding NONCLUSTERED indexes in an attempt to  improvement execution time: 

  • Index on the Date column in my Dimension Table: No Change
  • Index on the Client_Key field in the Fact table used to join to Clients dimension - gained ~4 seconds: Average run time now at ~12.5 seconds from ~17: Equal to disk storage, at least
  • Index on the Date_Key field in the Fact table which joins to the Date dimension: No Change

(Edit #2)

If Tableau gets great performance by creating a single in-memory table as the result of joining other tables, why can’t SQL Server? Let’s try it!

I took the same query that represents joining my fact table to all dimensions and just made a single materialized version of said query as a table on disk. Good so far. However, when I attempt to turn this into an In-Memory OLTP table, no joy:

image

Damn.

(Edit #3)

6:30 AM here, and I just added NONCLUSTERED indexes to each and every column on the Fact table that I join with. That got me slightly more than 1 second of additional improvement, so I’m no averaging about 11.1 seconds on this dashboard compared to 10.6 seconds against it’s on-disk cousin. Based on the fact my stopwatch thumb can be fast or slow depending on how much caffeine is coursing through my veins, I’ll call this “even”. 

If I really wanted to go all out, I see an opportunity to create a multi-column covering index on three fields in my fact table (two of which are measures). I suspect that would improve perf on the visualization that is taking “longest” at about 3.5 seconds. However, that seems to be overkill to me, so I’ll leave well enough alone for now.

Tableau Server Scalability Explained →

#Tableau Server scalability whitepaper updated for 8.1 and 64-bit yumminess!

Making a good first impression with Tableau Desktop

You’ve invested the time to create a great viz for your team, boss, or client…now, make sure you get fully paid — With money, adoration or whatever gets you through the night.

And you only get one chance to make a good first impression, right?  

Before you deliver your workbook, put your best foot forward by choosing your best-looking dashboard and putting it into Presentation Mode before you save. 

Example -  which “on open” experience looks and feels better to you? 

This: 

image

..Or this:

image

We’re talking about a trivial, 1-second exercise that really makes a difference when it comes to “presentation time”.

  • Choose your favorite viz
  • Hit F7 to go into Presentation Mode
  • Hit CTRL-S to save

…and you’re done. Send your work out into hard, cold world and wait for the kudos to come flooding in.

Tableau Server Performance on AWS EC2

NOTE: If you’ve just landed on this article and haven’t read this entry, you really should stop and give it a look. It contains important information I won’t be re-visiting here.

Overview

No surprise: you’re about to see very close correlation between Tableau Desktop’s performance and how Server behaves.

The data below comes directly from the admin View Performance History report — specifically Compute View.

I installed Server to 6 of the 10 AWS instances I tested previously and published the same workbook as before: This is the “big” workbook with the heavy-duty data source. I ran each Tableau Server with only one VizQL process so that it was easy to see caching in play.

Keep in mind that most of the work we’re doing here assumes the need for good IO due to the large extract that needs to be read into RAM as quickly as possible. Do you really need to run on an instance with SSDs or lots of provisioned IOPS? Maybe not, if you you don’t need the fast read for extract loading.

I tested two reports: 

  • Big extract" Dashboard (mentioned above)
  • The sample 2013 Sales Growth dashboard

I consider the second to be the day-to-day type of report most folks will be viewing on a daily basis. 

The general pattern I followed while testing each instance was:

  1. Login and run each report with NO tableau caches populated: VizQL hasn’t seen these reports yet, nor has the extract been loaded into RAM at the OS level.
  2. Logout, close browser.
  3. Login, run each report again. Fast response expected because we’ll likely go to the tile cache or VizQL Server cache
  4. Logout, and use task manager to kill the single VizQL Server process running: I’m purposefully blowing up anything in the VizQL Server cache. 
  5. Login, run each report again. Performance worse than the previous execution, but still better than the first. 
  6. Restart the machine to clear the extract from the OS’s RAM and repeat steps 1-6 until i start feeling silly

The report we really care about is “big”, and I followed the same steps with it on my externally facing Tableau Server (4 core i5 CPU, 32 GB RAM, SSD) and Server running under Parallels on my Mac (resources granted to VM: 4 core i7 CPU, 4 GB RAM, backed by SSD). I brought both down to 1 VizQLServer for the test.

External Server: 

  • Initial Load: 34.39 sec
  • Cached Load: .55 sec
  • Post VizQL Server reset: 27.4

Parallels VM on Mac: 

  • Initial Load: 17.14 sec (faster than I expected!)
  • Cached Load: .11 sec
  • Post VizQLServer reset: 17.48 

And now, the results: 

IMPORTANT NOTE: I’m using 10,000 IOPS as a stand-in “tag” for Instance Storage on an SSD. I did not create an EBS volume that delivered 10,000 provisioned IOPS. Instance Storage is explained in the previous blog entry on this topic. Where you see 0 IOPS, I’m using a standard volume, also discussed earlier.

Lets start with the 2013 Sales Growth dashboard. As you can see, Tableau makes quick work of this baby across the board on all hardware: 

image

Since these are going to render relatively quickly anyway, it is difficult to tell how much the cache is or isn’t coming into play. I’d assume the < 1 second executions are coming out of tile cache, the 1-2 second renders from the VizQLServer cache, and the 3+ second executions are being done “from scratch” - but it’s hard to say for sure. 

Note that our “big” report can also come back very quickly if cached correctly. Yowza!

It is very easy to see caching at work when rendering the “big” dashboard…here, I’ll make it even easier!:

image

We’re able to deliver sub-second performance on this big report in the green zone - that’s gotta be the tile cache at work. 

In the yellow zone, we’re generally spending ~50-60% or less time rendering a report than in the red zone on the same type of machine. In the red zone, we’re having to wait till the OS loads the extract into RAM, in the yellow zone we’re not, plus we’re probably getting some love from VizQL Server caching. 

Here’s the same information expressed in a slightly different way:

…the Upper Whisker always is displaying the longest, “nothing is cached” load, while the Lower Whisker shows a fully cached execution. 

And finally, it might be fun to overlay both the Desktop data set and the Server data set to see any differences: 

Yet another note: I “hand-timed” rendering of reports in Desktop - just me, my Nokia, and a stopwatch app. I noticed that there was often a good half-a-second+ difference between what my stopwatch was ticking off and what Tableau’s “executing query” dialog read. So, when you see only a second-ish difference between a Desktop and Server reading on the same hardware, you probably should just consider it a wash.

If you gift Desktop an extra second due to my arthritic stopwatch thumb, I think you’ll find the rendering times are remarkably similar between Server and her thick-client cousin.

I was frankly surprised to see that Server fared better than Desktop on the m3.large instance (2 cores @ 2.5 Ghz, 900 IOPS). Can’t explain it, but I don’t see anyone really wanting to run server on a 2 core box anyway so I’m not going to worry about it. 

One other common-sense thing that is probably worth mentioning - the cradle-to-grave experience a user goes through encapsulates “rendering” but also some other server-ish processing that can add some overhead. Don’t always assume a 0.27 second execution displays in the browser in less than a second for the user. 

Lessons Learned:

  • What’s good for Desktop on EC2 is good for Server
  • CPU is most important once data has been retrieved from the data source
  • If your data source is an extract, make sure to have decent disk performance or the CPU can’t get to work

Studying Tableau Performance Characteristics on AWS EC2

As a “Friday afternoon fun project” I decided to tinker around a little bit with Tableau on Amazon Web Services EC2. Then bad Russell arrived in my head and kept on asking questions. 

A weekend later, I find that I’ve stood up and banged on about 20 distinct EC2 instance configurations. I installed Tableau Desktop & Server to each one and compared what I saw.  I’m tired but vaguely satisfied. God knows what my bill from Amazon will be. Oh well.

In this two-parter, I’ll let you know what I found. We’ll look at Tableau Desktop Performance first, then deal with Tableau Server. 

The Goal

Understand HOW Tableau behaves on EC2. 

Not the Goal

Create prescriptive guidance on the instance type and storage subsystem you should stand up for various sizes of Tableau Server. As always this “depends”, and you need to do your own due diligence.

What I did

I decided that my “base” dashboard should be an interesting one. I used a ~220M row extract (about 5 GB on the file system). The dashboard itself contains two sheets, each of which need to touch all rows in the extract:

  • Viz One plots the trade price of a couple hundred equities across time (quarter) and trade size
  • Viz Two plots the trade price for two securities across time on a daily basis

On my main workstation I see the following results:

  • Cold Load: 17 sec
  • Cached Load: 15.4 sec

My Mac (jealous much?..a very early beta): 

  • Cold Load: 24 sec
  • Cached Load 11 sec

Windows Running on Mac (Parallels):

  • Cold Load: 16.25 sec
  • Cached Load: 14.55

Interesting / semi-germane info: 

Why do I break down cold vs. cached loads? Well, there is an IO cost to load my extract up into memory. Once it’s there, it stays there unless knocked out by something else.

This means I literally can “touch” an extract so that it is mapped into memory, close Tableau entirely and walk away, then re-launch Desktop and reload my viz a few minutes later….there’s a very good chance I’ll see a significant improvement in load time, even though a completely different process mapped the file into memory at some point in the past. This is pretty important, particularly if your disks are slow and it takes a while to do that initial loading of data into memory.

So let’s look at some data. You see 10 EC2 instances below, ranging from two to eight cores. I configured each machine with anywhere from 0 to 1400 provisioned IOPS for storage (click the previous hyperlink to read about what an IOPS means to AWS, or click here to read about IOPS in a more general sense). All tests were run on Windows 2012 Server Standard.

Click, CTRL-Click, or SHIFT-Click to compare one instance to another in terms of how quickly I was able to render the test dashboard.

First, let’s focus on the slowest of slow. This is a report that renders so slowly, it is actually physically painful: 

image

The run on the FAR right took a whopping 82 seconds to render (uncached) on a 2-core, low GHz machine with poor IO — 300 IOPS:

image

I used the free ATTO Disk Benchmark tool to test throughput on this drive, and look at how pathetic it is.  We can be pretty sure that the big extract is taking a long, long time to load into RAM based on these read numbers. 

But what about once it’s cached? The pain just keeps coming. Since we only have 2 cores and they are relatively low frequency, things still stink:

image

As you can see above, our friend is grouped in a cluster of readings taken from 1.8 and 2.0 GHz CPUs. This group collectively has the longest cached render time. 

Let’s shift gears. Where did we do well? 

image

You can see that in general, instances running storage with the most IOPS win:

#3 is a machine running 4000 IOPS - It was the champ in terms of rendering speed. Here’s what disk throughput looks like:

image

We’re getting about 3x the throughput vs. 300 IOPS. 

Group #2 contains machines which have either 4 or 8 cores running at 2.8 GHz and between 1200-1500 IOPS.

Throughput, 1500 IOPS: 

image

…and 1200 IOPS:

image

Group #2 is where things really start getting interesting. Note that this cluster of readings is taken from machines where we have 0 provisioned IOPS! How can 0 be better than 300, 1200, or 1500?! Well, this disk is not an EBS volume. It is an AWS standard volume and it behaves quite differently.  

Let’s let the test tell the story:

image

While the write performance is just about as poor as the 300 IOPS machines, look at that read! It is actually superior to the  transfer rate from our 4000 IOP rig. Since we’re only reading the extract into RAM as part of our rendering process, this suits us quite nicely. 

I wouldn’t want to use this standard volume to act as a temp drive or to write extracts on, but it’s just dandy for reading. 

What have we learned? 

Disk matters. Having poor IO is just like skipping your legs at the gym. Only bad things can happen. 

Until you have sufficient IO to read extracts into RAM, your CPU will literally stall. Here’s what I mean: 

image

Note that when this “un-cached” dashboard was rendered, the CPU wasn’t working all that hard. Our disk queue was at about .361, indicating that the disk was working. 

Now, look at the same machine rendering a “cached” report: 

image

The CPU is working ~3x as hard on this puppy - and look at the disk: not a whole lot going on there right now.

If your disk is fast enough, your CPU works as hard as it can for you - but if you can’t keep it “fed” with data, than it just sits there. 

This next part shouldn’t be hard to understand - once we have the data we need loaded into memory, he who has the fastest CPU wins:

image

Group 1 represents our 2.8 GHz CPUs. Note how they all cluster together right around 15-17 seconds.

Group 2 is a single box with a 2.0 GHz CPU. Render time = ~30 sec

Group 3 is our stinky 1.8 GHz cores. Render time = ~30 sec

Let’s summarize what we learned and how to apply it: 

  • If you’re using extracts, you must have decent read performance from your disk 
  • Having good write performance is important when creating extracts
  • A relatively low number of EBS IOPS won’t buy you much performance 
  • To get a rough estimate of throughput, use ATTO Disk Benchmark
  • An EBS volume with provisioned IOPS can guarantee a consistent-ish number of reads and writes. However, you may just do better to use a standard volume which delivers pretty good read performance at no extra cost

Now, it’s the bonus round. When you provision an instance, it comes with some root storage and “Instance Storage”. Here’s the c3.xlarge machine I’ve been using a lot in my testing: 

image

Instance Storage is ephemeral storage that only stays around as long as the image is running. If you stop the image, anything you saved on those two disks (in this case, 2 40 GB SSD drives) go away.  A quick reboot of the OS is fine - but turning the machine OFF means that storage is wiped.

What if I kept my big 5 GB extract on the C: Drive, and copied it to the Y: or Z: drive on the machine above when I wanted to do work..would that be fast? It damn well would be:

image

Ready? Boom!

image

The highlighted run above was sourced from the Z: drive. I arbitrarily “tagged” it as 4000 IOPS, but it obviously is going faster as it completed in 24 seconds vs. the 32.30 second run directly to it’s right (the prior “champ”).

The 24 second run used 27% of CPU vs. only 18% of CPU on the 32.3 second run — an example of good disk allowing the CPU to run the way it wants to. (Keep in mind we’re using the same CPU here)

We can see it came in FIRST across all 2.8 GHz CPUs, actually:

#1 -  The run backed by SSD

#2  - The run backed by a 4000 IOP volume

#3 - The run backed by a standard volume

 When all is said and done, #1 is actually closer to “cached” execution time than to it’s “un-cached” brethren. Pretty nice!image

To reiterate: since the storage I used for this final test is ephemeral, it probably wouldn’t be a good permanent solution. If I wanted to get SSD-like performance, I’d need to do something like the following in AWS:

  • Create 2+ 4000 IOP volumes for my instance
  • Use Windows Server Storage Pools to create a RAID 0 - like stripe across the 2+ volumes

That would be uber-speedy, but I’m getting tired and have decided to stop. 

Next article: “So what?! How fast can I make Server go?”