or: “@joemako had me at /currentadmin”.
So, I’m reading a 3-year old post on the Tableau forums because that’s what I like to do. and Joe points someone to Tableau Server online help with the following URL:
I had no idea. This beauty automatically lands you on the latest/greatest version of help.
Joe, I owe you a bunch of keystrokes. Thanks.
@highvizability wrote a great piece last year about how to geocode a user’s address using R. You can read the article here and download a great sample workbook:
Let’s do the same thing but with someone’s IP address, shall we?
Tableau has already documented one method to approach this, but technology moves on. FYI, I wouldn’t use the following technique if I’m going to plot thousands of marks / ip addresses on a map - it would probably take too long. That said…
First, I found another great blog by a gentleman who figured out how to make ip-based geocoding go in R. You can read article here. Everything I did is based on his work.
Essentially, I’ve dumbed-down what Andrew put together and modified his function so that it only returns the latitude and longitude of the ip address being encoded:
if (1 == length(ip))
# a single IP address
url <- paste(c(“http://freegeoip.net/json/”, ip), collapse=”)
ret <- fromJSON(readLines(url, warn=FALSE))
ret <- data.frame(t(unlist(ret)))
latlon <- paste(ret$latitude,ret$longitude,sep=”,”)
The function above requires the rjson package, which lives in CRAN.
The function must also be available to your RServ server. This is generally accomplished by finding the Rserv.config file for RServ and adding a reference to a file which holds the text for your function.
For example, I dropped the text above into a file called geo.txt, and then plugged that value into the config file like so:
After that, all the work happens in Tableau,
You’ll create an expression which calls the function and passes it the ip address (in this case, named [ip] in my dataset):
Then, create two additional expressions which parse out the latitude and longitude:
…and you’re done!
You can download samples from here: http://1drv.ms/PUO69Q
As an ex-blue badge, I have a love for SQL Server that’s baked into my genes. I. Love. SQL. Server. Clear?
So, I was excited to load up SQL Server 2014 for the first time since it RTM’d and take it for a spin.
I thought it would fun to run a little test: namely, run the same dashboard against “on disk” SQL Server, a Tableau Extract, and In-Memory OLTP on SQL Server. Let’s see what comes back fastest, shall we?
(What?! You say you don’t know what In-Memory OLTP is? Luddite! Read about it here).
The Dashboard and Data
Anyway, here is the blue dashboard I came up with for my ex-blue badge testing. No, it doesn’t follow dataviz best practices, but it’s blue and I like blue today:
The key thing I wanted to do was introduce several levels of grouping into multiple vizzes to create a meat-and-potatoes dashboard representative of what a lot of people do with Tableau.
The dataset I used was pretty small - just under 4M rows in a star schema. Here’s the ERD:
Only basic indexing is in play with the on-disk database - I have clustered indexes created on each of the PK fields on the dimension tables.
Using SQL Server 2014
Upgrading my SQL Server 2012 to 2014 was like buttah - very straight forward and pain-free. This made me happy.
One must store in-memory tables in a new type of SQL Server filegroup used for this purpose, so I added one of these suckers to my database and then used the Table Memory Optimization Advisor to make in-memory copies of my original tables. The TMOA is pretty much like a wizard that tells you if you’re doing anything stupid, gives you some basic advice and then creates a copy of the table you’re working with (with or without data) inside the Memory Optimized Data filegroup.
I no longer consider myself a SQL Server Pro - more like a week-end warrior…and the fact that I could turn this part of the project around so quickly should be seen as a feather in Microsoft’s cap.
I performed my testing as follows, timing each dashboard execution:
- Cycled SQL Server to clear cache (or for Tableau extract testing, I restarted the OS - this is the only way to know the extract file isn’t already in memory from a previous run). I called this an “un-cached run”.
- Ran the blue dashboard and closed Tableau
- Re-opened Tableau and ran the dashboard again (a “cached” run)
- Cycled SQL Server and/or restarted the OS once more and ran my blue dashboard, this time using Tableau Performance Recorder (another “un-cached run”)
…and things don’t look so good for In-Memory OTLP, do they?
Let’s look at the Performance Recorder output for Tableau extract based runs:
Geocoding took the most time, followed by the 920 millisecond query execution which helps to support this process.
Here’s the recorder when I ran the dashboard against “on disk” SQL Server:
The query which supports the running sum area chart at the top of my blue dashboard takes the longest here, followed by the same Geocoding task & geocode support queries.
And now, the surprisingly slow In-Memory OLTP source:
It took 29 seconds to run the query which powers the box and whisker plot viz. Insane! The others weren’t particularly light on their feet either, but 29 seconds is just ugly.
Here’s the SQL for the query in question:
DATEPART(month,[DimDate].[Full_SQL_Date]) AS [mn:Full_SQL_Date:ok],
[DimClient].[Client_Name] AS [none:Client_Name:nk],
DATEPART(year,[DimDate].[Full_SQL_Date]) AS [yr:Full_SQL_Date:ok],
SUM(CAST(CAST([FactDailyResponse].[Impressions] as bigint) as BIGINT)) AS [sum:Impressions:ok]
INNER JOIN [dbo].[DimClient] [DimClient] ON ([FactDailyResponse].[Client_Key] = [DimClient].[Client_Key])
INNER JOIN [dbo].[DimDate] [DimDate] ON ([FactDailyResponse].[Date_Key] = [DimDate].[Date_Key])
I ran this sucker in Query Analyzer and got more or less the same result, then turned checked out the execution plan and saw several alerts that statistics on my table indexes were out of date.
…and that was the problem.
This help topic says that statistics are created automatically but not updated automatically on memory-optimized tables. When I updated the statistics manually on the dimDate table, query execution immediately dropped to ~3 seconds in Query Analyzer.
I updated statistics on all of my tables and then tried again:
That’s better - still slower than on-disk SQL. Surprised.
- Don’t forget to update your statistics!
- Since everything is “cached” in an in-memory OLTP scenario, you’ll get about the same perf each time you run a query
- Using In-Memory OLTP isn’t “fire and forget” like I’ve gotten used to with SQL Server over the years - looks like I’m going to need to spend some time adding & tuning indexes for best effect. This will make free-form analysis clumsy as each user may focus on different fields (which may or may not be indexed)
- Extracts still offering better performance since all tables are “pre-joined” inside it.
More to come!
(Edit #1: It’s about 2 hours later)
I’ve started adding NONCLUSTERED indexes in an attempt to improvement execution time:
- Index on the Date column in my Dimension Table: No Change
- Index on the Client_Key field in the Fact table used to join to Clients dimension - gained ~4 seconds: Average run time now at ~12.5 seconds from ~17: Equal to disk storage, at least
- Index on the Date_Key field in the Fact table which joins to the Date dimension: No Change
If Tableau gets great performance by creating a single in-memory table as the result of joining other tables, why can’t SQL Server? Let’s try it!
I took the same query that represents joining my fact table to all dimensions and just made a single materialized version of said query as a table on disk. Good so far. However, when I attempt to turn this into an In-Memory OLTP table, no joy:
6:30 AM here, and I just added NONCLUSTERED indexes to each and every column on the Fact table that I join with. That got me slightly more than 1 second of additional improvement, so I’m no averaging about 11.1 seconds on this dashboard compared to 10.6 seconds against it’s on-disk cousin. Based on the fact my stopwatch thumb can be fast or slow depending on how much caffeine is coursing through my veins, I’ll call this “even”.
If I really wanted to go all out, I see an opportunity to create a multi-column covering index on three fields in my fact table (two of which are measures). I suspect that would improve perf on the visualization that is taking “longest” at about 3.5 seconds. However, that seems to be overkill to me, so I’ll leave well enough alone for now.
You’ve invested the time to create a great viz for your team, boss, or client…now, make sure you get fully paid — With money, adoration or whatever gets you through the night.
And you only get one chance to make a good first impression, right?
Before you deliver your workbook, put your best foot forward by choosing your best-looking dashboard and putting it into Presentation Mode before you save.
Example - which “on open” experience looks and feels better to you?
We’re talking about a trivial, 1-second exercise that really makes a difference when it comes to “presentation time”.
- Choose your favorite viz
- Hit F7 to go into Presentation Mode
- Hit CTRL-S to save
…and you’re done. Send your work out into hard, cold world and wait for the kudos to come flooding in.
No surprise: you’re about to see very close correlation between Tableau Desktop’s performance and how Server behaves.
The data below comes directly from the admin View Performance History report — specifically Compute View.
I installed Server to 6 of the 10 AWS instances I tested previously and published the same workbook as before: This is the “big” workbook with the heavy-duty data source. I ran each Tableau Server with only one VizQL process so that it was easy to see caching in play.
Keep in mind that most of the work we’re doing here assumes the need for good IO due to the large extract that needs to be read into RAM as quickly as possible. Do you really need to run on an instance with SSDs or lots of provisioned IOPS? Maybe not, if you you don’t need the fast read for extract loading.
I tested two reports:
- “Big extract" Dashboard (mentioned above)
- The sample 2013 Sales Growth dashboard
I consider the second to be the day-to-day type of report most folks will be viewing on a daily basis.
The general pattern I followed while testing each instance was:
- Login and run each report with NO tableau caches populated: VizQL hasn’t seen these reports yet, nor has the extract been loaded into RAM at the OS level.
- Logout, close browser.
- Login, run each report again. Fast response expected because we’ll likely go to the tile cache or VizQL Server cache
- Logout, and use task manager to kill the single VizQL Server process running: I’m purposefully blowing up anything in the VizQL Server cache.
- Login, run each report again. Performance worse than the previous execution, but still better than the first.
- Restart the machine to clear the extract from the OS’s RAM and repeat steps 1-6 until i start feeling silly
The report we really care about is “big”, and I followed the same steps with it on my externally facing Tableau Server (4 core i5 CPU, 32 GB RAM, SSD) and Server running under Parallels on my Mac (resources granted to VM: 4 core i7 CPU, 4 GB RAM, backed by SSD). I brought both down to 1 VizQLServer for the test.
- Initial Load: 34.39 sec
- Cached Load: .55 sec
- Post VizQL Server reset: 27.4
Parallels VM on Mac:
- Initial Load: 17.14 sec (faster than I expected!)
- Cached Load: .11 sec
- Post VizQLServer reset: 17.48
And now, the results:
IMPORTANT NOTE: I’m using 10,000 IOPS as a stand-in “tag” for Instance Storage on an SSD. I did not create an EBS volume that delivered 10,000 provisioned IOPS. Instance Storage is explained in the previous blog entry on this topic. Where you see 0 IOPS, I’m using a standard volume, also discussed earlier.
Lets start with the 2013 Sales Growth dashboard. As you can see, Tableau makes quick work of this baby across the board on all hardware:
Since these are going to render relatively quickly anyway, it is difficult to tell how much the cache is or isn’t coming into play. I’d assume the < 1 second executions are coming out of tile cache, the 1-2 second renders from the VizQLServer cache, and the 3+ second executions are being done “from scratch” - but it’s hard to say for sure.
Note that our “big” report can also come back very quickly if cached correctly. Yowza!
It is very easy to see caching at work when rendering the “big” dashboard…here, I’ll make it even easier!:
We’re able to deliver sub-second performance on this big report in the green zone - that’s gotta be the tile cache at work.
In the yellow zone, we’re generally spending ~50-60% or less time rendering a report than in the red zone on the same type of machine. In the red zone, we’re having to wait till the OS loads the extract into RAM, in the yellow zone we’re not, plus we’re probably getting some love from VizQL Server caching.
Here’s the same information expressed in a slightly different way:
…the Upper Whisker always is displaying the longest, “nothing is cached” load, while the Lower Whisker shows a fully cached execution.
And finally, it might be fun to overlay both the Desktop data set and the Server data set to see any differences:
Yet another note: I “hand-timed” rendering of reports in Desktop - just me, my Nokia, and a stopwatch app. I noticed that there was often a good half-a-second+ difference between what my stopwatch was ticking off and what Tableau’s “executing query” dialog read. So, when you see only a second-ish difference between a Desktop and Server reading on the same hardware, you probably should just consider it a wash.
If you gift Desktop an extra second due to my arthritic stopwatch thumb, I think you’ll find the rendering times are remarkably similar between Server and her thick-client cousin.
I was frankly surprised to see that Server fared better than Desktop on the m3.large instance (2 cores @ 2.5 Ghz, 900 IOPS). Can’t explain it, but I don’t see anyone really wanting to run server on a 2 core box anyway so I’m not going to worry about it.
One other common-sense thing that is probably worth mentioning - the cradle-to-grave experience a user goes through encapsulates “rendering” but also some other server-ish processing that can add some overhead. Don’t always assume a 0.27 second execution displays in the browser in less than a second for the user.
- What’s good for Desktop on EC2 is good for Server
- CPU is most important once data has been retrieved from the data source
- If your data source is an extract, make sure to have decent disk performance or the CPU can’t get to work
As a “Friday afternoon fun project” I decided to tinker around a little bit with Tableau on Amazon Web Services EC2. Then bad Russell arrived in my head and kept on asking questions.
A weekend later, I find that I’ve stood up and banged on about 20 distinct EC2 instance configurations. I installed Tableau Desktop & Server to each one and compared what I saw. I’m tired but vaguely satisfied. God knows what my bill from Amazon will be. Oh well.
In this two-parter, I’ll let you know what I found. We’ll look at Tableau Desktop Performance first, then deal with Tableau Server.
Understand HOW Tableau behaves on EC2.
Not the Goal
Create prescriptive guidance on the instance type and storage subsystem you should stand up for various sizes of Tableau Server. As always this “depends”, and you need to do your own due diligence.
What I did
I decided that my “base” dashboard should be an interesting one. I used a ~220M row extract (about 5 GB on the file system). The dashboard itself contains two sheets, each of which need to touch all rows in the extract:
- Viz One plots the trade price of a couple hundred equities across time (quarter) and trade size
- Viz Two plots the trade price for two securities across time on a daily basis
On my main workstation I see the following results:
- Cold Load: 17 sec
- Cached Load: 15.4 sec
My Mac (jealous much?..a very early beta):
- Cold Load: 24 sec
- Cached Load 11 sec
Windows Running on Mac (Parallels):
- Cold Load: 16.25 sec
- Cached Load: 14.55
Interesting / semi-germane info:
Why do I break down cold vs. cached loads? Well, there is an IO cost to load my extract up into memory. Once it’s there, it stays there unless knocked out by something else.
This means I literally can “touch” an extract so that it is mapped into memory, close Tableau entirely and walk away, then re-launch Desktop and reload my viz a few minutes later….there’s a very good chance I’ll see a significant improvement in load time, even though a completely different process mapped the file into memory at some point in the past. This is pretty important, particularly if your disks are slow and it takes a while to do that initial loading of data into memory.
So let’s look at some data. You see 10 EC2 instances below, ranging from two to eight cores. I configured each machine with anywhere from 0 to 1400 provisioned IOPS for storage (click the previous hyperlink to read about what an IOPS means to AWS, or click here to read about IOPS in a more general sense). All tests were run on Windows 2012 Server Standard.
Click, CTRL-Click, or SHIFT-Click to compare one instance to another in terms of how quickly I was able to render the test dashboard.
The run on the FAR right took a whopping 82 seconds to render (uncached) on a 2-core, low GHz machine with poor IO — 300 IOPS:
I used the free ATTO Disk Benchmark tool to test throughput on this drive, and look at how pathetic it is. We can be pretty sure that the big extract is taking a long, long time to load into RAM based on these read numbers.
But what about once it’s cached? The pain just keeps coming. Since we only have 2 cores and they are relatively low frequency, things still stink:
As you can see above, our friend is grouped in a cluster of readings taken from 1.8 and 2.0 GHz CPUs. This group collectively has the longest cached render time.
Let’s shift gears. Where did we do well?
You can see that in general, instances running storage with the most IOPS win:
#3 is a machine running 4000 IOPS - It was the champ in terms of rendering speed. Here’s what disk throughput looks like:
We’re getting about 3x the throughput vs. 300 IOPS.
Group #2 contains machines which have either 4 or 8 cores running at 2.8 GHz and between 1200-1500 IOPS.
Throughput, 1500 IOPS:
…and 1200 IOPS:
Group #2 is where things really start getting interesting. Note that this cluster of readings is taken from machines where we have 0 provisioned IOPS! How can 0 be better than 300, 1200, or 1500?! Well, this disk is not an EBS volume. It is an AWS standard volume and it behaves quite differently.
Let’s let the test tell the story:
While the write performance is just about as poor as the 300 IOPS machines, look at that read! It is actually superior to the transfer rate from our 4000 IOP rig. Since we’re only reading the extract into RAM as part of our rendering process, this suits us quite nicely.
I wouldn’t want to use this standard volume to act as a temp drive or to write extracts on, but it’s just dandy for reading.
What have we learned?
Disk matters. Having poor IO is just like skipping your legs at the gym. Only bad things can happen.
Until you have sufficient IO to read extracts into RAM, your CPU will literally stall. Here’s what I mean:
Note that when this “un-cached” dashboard was rendered, the CPU wasn’t working all that hard. Our disk queue was at about .361, indicating that the disk was working.
Now, look at the same machine rendering a “cached” report:
The CPU is working ~3x as hard on this puppy - and look at the disk: not a whole lot going on there right now.
If your disk is fast enough, your CPU works as hard as it can for you - but if you can’t keep it “fed” with data, than it just sits there.
This next part shouldn’t be hard to understand - once we have the data we need loaded into memory, he who has the fastest CPU wins:
Group 1 represents our 2.8 GHz CPUs. Note how they all cluster together right around 15-17 seconds.
Group 2 is a single box with a 2.0 GHz CPU. Render time = ~30 sec
Group 3 is our stinky 1.8 GHz cores. Render time = ~30 sec
Let’s summarize what we learned and how to apply it:
- If you’re using extracts, you must have decent read performance from your disk
- Having good write performance is important when creating extracts
- A relatively low number of EBS IOPS won’t buy you much performance
- To get a rough estimate of throughput, use ATTO Disk Benchmark
- An EBS volume with provisioned IOPS can guarantee a consistent-ish number of reads and writes. However, you may just do better to use a standard volume which delivers pretty good read performance at no extra cost
Now, it’s the bonus round. When you provision an instance, it comes with some root storage and “Instance Storage”. Here’s the c3.xlarge machine I’ve been using a lot in my testing:
Instance Storage is ephemeral storage that only stays around as long as the image is running. If you stop the image, anything you saved on those two disks (in this case, 2 40 GB SSD drives) go away. A quick reboot of the OS is fine - but turning the machine OFF means that storage is wiped.
What if I kept my big 5 GB extract on the C: Drive, and copied it to the Y: or Z: drive on the machine above when I wanted to do work..would that be fast? It damn well would be:
The highlighted run above was sourced from the Z: drive. I arbitrarily “tagged” it as 4000 IOPS, but it obviously is going faster as it completed in 24 seconds vs. the 32.30 second run directly to it’s right (the prior “champ”).
The 24 second run used 27% of CPU vs. only 18% of CPU on the 32.3 second run — an example of good disk allowing the CPU to run the way it wants to. (Keep in mind we’re using the same CPU here)
We can see it came in FIRST across all 2.8 GHz CPUs, actually:
#1 - The run backed by SSD
#2 - The run backed by a 4000 IOP volume
#3 - The run backed by a standard volume
When all is said and done, #1 is actually closer to “cached” execution time than to it’s “un-cached” brethren. Pretty nice!
To reiterate: since the storage I used for this final test is ephemeral, it probably wouldn’t be a good permanent solution. If I wanted to get SSD-like performance, I’d need to do something like the following in AWS:
- Create 2+ 4000 IOP volumes for my instance
- Use Windows Server Storage Pools to create a RAID 0 - like stripe across the 2+ volumes
That would be uber-speedy, but I’m getting tired and have decided to stop.
Next article: “So what?! How fast can I make Server go?”