"Tableau"

IP-based Geo-location in Tableau (New! Now with more R!)

@highvizability wrote a great piece last year about how to geocode a user’s address using R. You can read the article here and download a great sample workbook:

http://www.dataplusscience.com/Geocoding%20in%20Tableau%20using%20R.html 

Let’s do the same thing but with someone’s IP address, shall we?

Tableau has already documented one method to approach this, but technology moves on. FYI, I wouldn’t use the following technique if I’m going to plot thousands of  marks / ip addresses on a map - it would probably take too long. That said…

First, I found another great blog by a gentleman who figured out how to make ip-based geocoding go in R. You can read article here. Everything I did is based on his work.

Essentially, I’ve dumbed-down what Andrew put together and modified his function so that it only returns the latitude and longitude of the ip address being encoded:

{
     if (1 == length(ip))
     {
          # a single IP address
          require(rjson)
          url <- paste(c(“http://freegeoip.net/json/”, ip), collapse=”)
          ret <- fromJSON(readLines(url, warn=FALSE))
          ret <- data.frame(t(unlist(ret)))
          latlon <- paste(ret$latitude,ret$longitude,sep=”,”)
          return(latlon)

     }
}

The function above requires the rjson package, which lives in CRAN.

The function must also be available to your RServ server. This is generally accomplished by finding the Rserv.config file for RServ and adding a reference to a file which holds the text for your function.

For example, I dropped the text above into a file called geo.txt, and then plugged that value into the config file like so:

image

After that, all the work happens in Tableau,

You’ll create an expression which calls the function and passes it the ip address (in this case, named [ip] in my dataset):

image

Then, create two additional expressions which parse out the latitude and longitude:

image

…and you’re done!

You can download samples from here:  http://1drv.ms/PUO69Q

Testing Tableau with SQL Server 2014 (in memory!)

As an ex-blue badge, I have a love for SQL Server that’s baked into my genes. I. Love. SQL. Server. Clear?

So, I was excited to load up SQL Server 2014 for the first time since it RTM’d and take it for a spin.

I thought it would fun to run a little test: namely, run the same dashboard against “on disk” SQL Server, a Tableau Extract, and In-Memory OLTP on SQL Server. Let’s see what comes back fastest, shall we? 

(What?! You say you don’t know what In-Memory OLTP is? Luddite! Read about it here).

The Dashboard and Data

Anyway, here is the blue dashboard I came up with for my ex-blue badge testing. No, it doesn’t follow dataviz best practices, but it’s blue and I like blue today:

image

The key thing I wanted to do was introduce several levels of grouping into multiple vizzes to create a meat-and-potatoes dashboard representative of what a lot of people do with Tableau. 

The dataset I used was pretty small - just under 4M rows in a star schema. Here’s the ERD:

image

Only basic indexing is in play with the on-disk database - I have clustered indexes created on each of the PK fields on the dimension tables. 

Using SQL Server 2014

Upgrading my SQL Server 2012 to 2014 was like buttah - very straight forward and pain-free. This made me happy. 

One must store in-memory tables in a new type of SQL Server filegroup used for this purpose, so I added one of these suckers to my database and then used the Table Memory Optimization Advisor to make in-memory copies of my original tables. The TMOA is pretty much like a wizard that tells you if you’re doing anything stupid, gives you some basic advice and then creates a copy of the table you’re working with (with or without data) inside the Memory Optimized Data filegroup. 

I no longer consider myself a SQL Server Pro - more like a week-end warrior…and the fact that I could turn this part of the project around so quickly should be seen as a feather in Microsoft’s cap. 

The Testing

I performed my testing as follows, timing each dashboard execution: 

  • Cycled SQL Server to clear cache (or for Tableau extract testing, I restarted the OS - this is the only way to know the extract file isn’t already in memory from a previous run). I called this an “un-cached run”.
  • Ran the blue dashboard and closed Tableau
  • Re-opened Tableau and ran the dashboard again (a “cached” run)
  • Cycled SQL Server and/or restarted the OS once more and ran my blue dashboard, this time using Tableau Performance Recorder (another “un-cached run”)

…and things don’t look so good for In-Memory OTLP, do they?

Let’s look at the Performance Recorder output for Tableau extract based runs:

image

Geocoding took the most time, followed by the 920 millisecond query execution which helps to support this process. 

Here’s the recorder when I ran the dashboard against “on disk” SQL Server: 

image

The query which supports the running sum area chart at the top of my blue dashboard takes the longest here, followed by the same Geocoding task & geocode support queries.

And now, the surprisingly slow In-Memory OLTP source: 

image

It took 29 seconds to run the query which powers the box and whisker plot viz. Insane! The others weren’t particularly light on their feet either, but 29 seconds is just ugly.

Here’s the SQL for the query in question:

SELECT
   DATEPART(month,[DimDate].[Full_SQL_Date]) AS [mn:Full_SQL_Date:ok],
  [DimClient].[Client_Name] AS [none:Client_Name:nk],
   DATEPART(year,[DimDate].[Full_SQL_Date]) AS [yr:Full_SQL_Date:ok],
   SUM(CAST(CAST([FactDailyResponse].[Impressions] as bigint) as BIGINT)) AS [sum:Impressions:ok]
FROM
   [dbo].[FactDailyResponse] [FactDailyResponse]
   INNER JOIN [dbo].[DimClient] [DimClient] ON ([FactDailyResponse].[Client_Key] = [DimClient].[Client_Key])
   INNER JOIN [dbo].[DimDate] [DimDate] ON ([FactDailyResponse].[Date_Key] = [DimDate].[Date_Key])
GROUP BY
   DATEPART(month,[DimDate].[Full_SQL_Date]),
  [DimClient].[Client_Name],
   DATEPART(year,[DimDate].[Full_SQL_Date])

 I ran this sucker in Query Analyzer and got more or less the same result, then turned checked out the execution plan and saw several alerts that statistics on my table indexes were out of date. 

…and that was the problem. 

This help topic says that statistics are created automatically but not updated automatically on memory-optimized tables. When I updated the statistics manually on the dimDate table, query execution immediately dropped to ~3 seconds in Query Analyzer. 

I updated statistics on all of my tables and then tried again:

That’s better - still slower than on-disk SQL. Surprised.

What I’m seeing thus far could very well have to do with the type of index I’m using on my tables: (a NONCLUSTERED HASH on the PK field). Books Online indicates that this is what I should use for a field acting as my primary key, but I might try a NONCLUSTERED index instead. Might also try to add additional indexes on some of the fields that I get used a bunch, like my date field. 
Here’s the performance recorder running my dashboard against in-memory with updated statistics on my indexes:
image
Take-aways thus far:
  • Don’t forget to update your statistics!
  • Since everything is “cached” in an in-memory OLTP scenario, you’ll get about the same perf each time you run a query
  • Using In-Memory OLTP isn’t “fire and forget” like I’ve gotten used to with SQL Server over the years - looks like I’m going to need to spend some time adding & tuning indexes for best effect. This will make free-form analysis clumsy as each user may focus on different fields (which may or may not be indexed)
  • Extracts still offering better performance since all tables are “pre-joined” inside it.

More to come!

(Edit #1: It’s about 2 hours later)

I’ve started adding NONCLUSTERED indexes in an attempt to  improvement execution time: 

  • Index on the Date column in my Dimension Table: No Change
  • Index on the Client_Key field in the Fact table used to join to Clients dimension - gained ~4 seconds: Average run time now at ~12.5 seconds from ~17: Equal to disk storage, at least
  • Index on the Date_Key field in the Fact table which joins to the Date dimension: No Change

(Edit #2)

If Tableau gets great performance by creating a single in-memory table as the result of joining other tables, why can’t SQL Server? Let’s try it!

I took the same query that represents joining my fact table to all dimensions and just made a single materialized version of said query as a table on disk. Good so far. However, when I attempt to turn this into an In-Memory OLTP table, no joy:

image

Damn.

(Edit #3)

6:30 AM here, and I just added NONCLUSTERED indexes to each and every column on the Fact table that I join with. That got me slightly more than 1 second of additional improvement, so I’m no averaging about 11.1 seconds on this dashboard compared to 10.6 seconds against it’s on-disk cousin. Based on the fact my stopwatch thumb can be fast or slow depending on how much caffeine is coursing through my veins, I’ll call this “even”. 

If I really wanted to go all out, I see an opportunity to create a multi-column covering index on three fields in my fact table (two of which are measures). I suspect that would improve perf on the visualization that is taking “longest” at about 3.5 seconds. However, that seems to be overkill to me, so I’ll leave well enough alone for now.

Tableau Server Scalability Explained →

#Tableau Server scalability whitepaper updated for 8.1 and 64-bit yumminess!

Making a good first impression with Tableau Desktop

You’ve invested the time to create a great viz for your team, boss, or client…now, make sure you get fully paid — With money, adoration or whatever gets you through the night.

And you only get one chance to make a good first impression, right?  

Before you deliver your workbook, put your best foot forward by choosing your best-looking dashboard and putting it into Presentation Mode before you save. 

Example -  which “on open” experience looks and feels better to you? 

This: 

image

..Or this:

image

We’re talking about a trivial, 1-second exercise that really makes a difference when it comes to “presentation time”.

  • Choose your favorite viz
  • Hit F7 to go into Presentation Mode
  • Hit CTRL-S to save

…and you’re done. Send your work out into hard, cold world and wait for the kudos to come flooding in.

Am AMAZING hack by Michael Kovner: TabChat for #Tableau Server

You probably already know “Kovner” (the man with no first first name). Kovner co-presented at a number of TCC sessions with me, and is one of the scary-smart young guns at Tableau - he’s also the guy who created the Tables in the East game. 

Now he’s gone off and created TabChat, a proof of concept that allows two or more users to collaboratively explore, filter, and chat about a viz being displayed. It’s pretty self-explanatory, but you can click “How to use this” if you get lost. 

User 1 logs in:

image

User 2 Logs in:

image

So, jump out here, play with TabChat, then bookmark Kovner’s blog and return often.

http://tabcmd.com/tabchat-socketio-javascript-api

Connecting Tableau and PowerPivot. It just works.

Over the last few months I’ve seen a few folks having problems connecting Tableau and PowerPivot data sources.

Occasionally, the person wasn’t able to connect to a workbook on their local machine. But more often than not, the challenge was connecting to a workbook saved in the PowerPivot Gallery. 

Being a masochist, I decided it might be fun to install the Microsoft BI stack on a VM in my domain: It’s been a few years since I’ve felt this particular type of pain, and I was itiching for a good thrashing.  In the immortal words of Eric Stratton:

“…In this case, I think we have to go all out. I think that this situation absolutely requires a really futile and stupid gesture be done on somebody’s part”

So, three hours later I have a single node of SharePoint 2013 up and running with Excel Services and the SQL Server PowerPivot System Service chugging along. Hurts so good!

image

I also went ahead and updated Office (x86) from 2010 to 2013 - I had been running some old add-ins that finally got updated and now work in 2013, so why not go all in.

Off to the Tableau Website where this handy-dandy article told me which version of the add-in to download and install with Tableau 8.1:

http://kb.tableausoftware.com/articles/knowledgebase/choosing-powerpivot-addin-and-driver

Since I’m running 8.1 and like to run x86 Office, here’s what I need:

image

I dutifully download the correct Add-in from the Tableau site:

http://www.tableausoftware.com/support/drivers

image

And now, I’m ready to test…

First, I find my PowerPivot file:

image

Next, connect to it:

image

 Volia, one data model ready for consumption: 

image

But that was supposed to be easy. How about connecting to the same workbook saved directly out in SharePoint?

Yay! An error:

"The path you specified is invalid"

"Analysis Services database error 0x80004005: Either a connection cannot be made to the SharePoint server, or Analysis Services is not running on the computer specified"

image

Aha! Look what those chumps did to us and what I did to myself with the cute file name!

image

Note the spaces in the name of the default Power Pivot document library that was created for me: “PowerPivot Gallery”. 

You can even see that the space is being encoded with %20 in the browser. 

Likewise, my trying-too-hard-to-be-funny file name has tons of spaces. 

I need to reference this file like so:

http://SharePoint2013/PowerPivot%20Gallery/I%20want%20my%20morning%20back.xlsx

…and it worked. 

image

Now, the funny thing here is that I JUST tried the same thing again without encoding my spaces:

http://SharePoint2013/PowerPivot Gallery/I want my morning back.xlsx

And it continued to work. In fact, I can’t get it to fail now. 

So, you may need to encode your spaces. Or not, I’m not sure. But I have no problems accessing this stuff from the local filesystem or via SharePoint itself. 

Enough pain for one day, time to watch the Jersey Shore marathon. 

#Tableau and SQL Azure: Don’t forget to…

Tableau can easily connect to #SQLAzure, but there’s one tiny catch - it appears we (Tableau) may need to read something in SQL’s master database during the connection process.  Quite often your friendly DBA may not have explicitly added YOUR login as a user in that database. 

For example, this blog entry shows how to add users to a database in SQL Azure: 

http://blogs.msdn.com/b/sqlazure/archive/2010/06/21/10028038.aspx

…note how the user in question is only being added to the database they should have access to? Makes sense, as we don’t expect to have normal users spellunking around in master. 

However, if you don’t also add the user to master, the following error will occur (For Google: “Invalid username and password”):

While your friends and loved ones may be highly amused as you enter a CORRECT username and password over and over again, begin turning red, and then dropping F-bombs, here’s the solution — get your admin to add you as a user to master. To be clear, you don’t even need to be added to a SQL Server Security role (like db_datareader) which enables to you to see stuff - you just need to be a user:

CREATE USER Tableau FROM LOGIN Tableau;

Happy Azure-ing!

SAML and Trusted Tickets in #Tableau?

I’ve started writing my Great American Novel. Here are the first few passages:

"Do SAML and Trusted Tickets work together in Tableau 8.1+?" asked Timmy, the curiously mottled grasshopper.

"Yes." responded David, as he stretched his legs luxuriously under the table.

Cindy, seizing the opportunity, grabbed the last piece of bacon and continued to page through the New York Times. 

Then, the zombies came.