@highvizability wrote a great piece last year about how to geocode a user’s address using R. You can read the article here and download a great sample workbook:
Let’s do the same thing but with someone’s IP address, shall we?
Tableau has already documented one method to approach this, but technology moves on. FYI, I wouldn’t use the following technique if I’m going to plot thousands of marks / ip addresses on a map - it would probably take too long. That said…
First, I found another great blog by a gentleman who figured out how to make ip-based geocoding go in R. You can read article here. Everything I did is based on his work.
Essentially, I’ve dumbed-down what Andrew put together and modified his function so that it only returns the latitude and longitude of the ip address being encoded:
if (1 == length(ip))
# a single IP address
url <- paste(c(“http://freegeoip.net/json/”, ip), collapse=”)
ret <- fromJSON(readLines(url, warn=FALSE))
ret <- data.frame(t(unlist(ret)))
latlon <- paste(ret$latitude,ret$longitude,sep=”,”)
The function above requires the rjson package, which lives in CRAN.
The function must also be available to your RServ server. This is generally accomplished by finding the Rserv.config file for RServ and adding a reference to a file which holds the text for your function.
For example, I dropped the text above into a file called geo.txt, and then plugged that value into the config file like so:
After that, all the work happens in Tableau,
You’ll create an expression which calls the function and passes it the ip address (in this case, named [ip] in my dataset):
Then, create two additional expressions which parse out the latitude and longitude:
…and you’re done!
You can download samples from here: http://1drv.ms/PUO69Q
As an ex-blue badge, I have a love for SQL Server that’s baked into my genes. I. Love. SQL. Server. Clear?
So, I was excited to load up SQL Server 2014 for the first time since it RTM’d and take it for a spin.
I thought it would fun to run a little test: namely, run the same dashboard against “on disk” SQL Server, a Tableau Extract, and In-Memory OLTP on SQL Server. Let’s see what comes back fastest, shall we?
(What?! You say you don’t know what In-Memory OLTP is? Luddite! Read about it here).
The Dashboard and Data
Anyway, here is the blue dashboard I came up with for my ex-blue badge testing. No, it doesn’t follow dataviz best practices, but it’s blue and I like blue today:
The key thing I wanted to do was introduce several levels of grouping into multiple vizzes to create a meat-and-potatoes dashboard representative of what a lot of people do with Tableau.
The dataset I used was pretty small - just under 4M rows in a star schema. Here’s the ERD:
Only basic indexing is in play with the on-disk database - I have clustered indexes created on each of the PK fields on the dimension tables.
Using SQL Server 2014
Upgrading my SQL Server 2012 to 2014 was like buttah - very straight forward and pain-free. This made me happy.
One must store in-memory tables in a new type of SQL Server filegroup used for this purpose, so I added one of these suckers to my database and then used the Table Memory Optimization Advisor to make in-memory copies of my original tables. The TMOA is pretty much like a wizard that tells you if you’re doing anything stupid, gives you some basic advice and then creates a copy of the table you’re working with (with or without data) inside the Memory Optimized Data filegroup.
I no longer consider myself a SQL Server Pro - more like a week-end warrior…and the fact that I could turn this part of the project around so quickly should be seen as a feather in Microsoft’s cap.
I performed my testing as follows, timing each dashboard execution:
- Cycled SQL Server to clear cache (or for Tableau extract testing, I restarted the OS - this is the only way to know the extract file isn’t already in memory from a previous run). I called this an “un-cached run”.
- Ran the blue dashboard and closed Tableau
- Re-opened Tableau and ran the dashboard again (a “cached” run)
- Cycled SQL Server and/or restarted the OS once more and ran my blue dashboard, this time using Tableau Performance Recorder (another “un-cached run”)
…and things don’t look so good for In-Memory OTLP, do they?
Let’s look at the Performance Recorder output for Tableau extract based runs:
Geocoding took the most time, followed by the 920 millisecond query execution which helps to support this process.
Here’s the recorder when I ran the dashboard against “on disk” SQL Server:
The query which supports the running sum area chart at the top of my blue dashboard takes the longest here, followed by the same Geocoding task & geocode support queries.
And now, the surprisingly slow In-Memory OLTP source:
It took 29 seconds to run the query which powers the box and whisker plot viz. Insane! The others weren’t particularly light on their feet either, but 29 seconds is just ugly.
Here’s the SQL for the query in question:
DATEPART(month,[DimDate].[Full_SQL_Date]) AS [mn:Full_SQL_Date:ok],
[DimClient].[Client_Name] AS [none:Client_Name:nk],
DATEPART(year,[DimDate].[Full_SQL_Date]) AS [yr:Full_SQL_Date:ok],
SUM(CAST(CAST([FactDailyResponse].[Impressions] as bigint) as BIGINT)) AS [sum:Impressions:ok]
INNER JOIN [dbo].[DimClient] [DimClient] ON ([FactDailyResponse].[Client_Key] = [DimClient].[Client_Key])
INNER JOIN [dbo].[DimDate] [DimDate] ON ([FactDailyResponse].[Date_Key] = [DimDate].[Date_Key])
I ran this sucker in Query Analyzer and got more or less the same result, then turned checked out the execution plan and saw several alerts that statistics on my table indexes were out of date.
…and that was the problem.
This help topic says that statistics are created automatically but not updated automatically on memory-optimized tables. When I updated the statistics manually on the dimDate table, query execution immediately dropped to ~3 seconds in Query Analyzer.
I updated statistics on all of my tables and then tried again:
That’s better - still slower than on-disk SQL. Surprised.
- Don’t forget to update your statistics!
- Since everything is “cached” in an in-memory OLTP scenario, you’ll get about the same perf each time you run a query
- Using In-Memory OLTP isn’t “fire and forget” like I’ve gotten used to with SQL Server over the years - looks like I’m going to need to spend some time adding & tuning indexes for best effect. This will make free-form analysis clumsy as each user may focus on different fields (which may or may not be indexed)
- Extracts still offering better performance since all tables are “pre-joined” inside it.
More to come!
(Edit #1: It’s about 2 hours later)
I’ve started adding NONCLUSTERED indexes in an attempt to improvement execution time:
- Index on the Date column in my Dimension Table: No Change
- Index on the Client_Key field in the Fact table used to join to Clients dimension - gained ~4 seconds: Average run time now at ~12.5 seconds from ~17: Equal to disk storage, at least
- Index on the Date_Key field in the Fact table which joins to the Date dimension: No Change
If Tableau gets great performance by creating a single in-memory table as the result of joining other tables, why can’t SQL Server? Let’s try it!
I took the same query that represents joining my fact table to all dimensions and just made a single materialized version of said query as a table on disk. Good so far. However, when I attempt to turn this into an In-Memory OLTP table, no joy:
6:30 AM here, and I just added NONCLUSTERED indexes to each and every column on the Fact table that I join with. That got me slightly more than 1 second of additional improvement, so I’m no averaging about 11.1 seconds on this dashboard compared to 10.6 seconds against it’s on-disk cousin. Based on the fact my stopwatch thumb can be fast or slow depending on how much caffeine is coursing through my veins, I’ll call this “even”.
If I really wanted to go all out, I see an opportunity to create a multi-column covering index on three fields in my fact table (two of which are measures). I suspect that would improve perf on the visualization that is taking “longest” at about 3.5 seconds. However, that seems to be overkill to me, so I’ll leave well enough alone for now.
You’ve invested the time to create a great viz for your team, boss, or client…now, make sure you get fully paid — With money, adoration or whatever gets you through the night.
And you only get one chance to make a good first impression, right?
Before you deliver your workbook, put your best foot forward by choosing your best-looking dashboard and putting it into Presentation Mode before you save.
Example - which “on open” experience looks and feels better to you?
We’re talking about a trivial, 1-second exercise that really makes a difference when it comes to “presentation time”.
- Choose your favorite viz
- Hit F7 to go into Presentation Mode
- Hit CTRL-S to save
…and you’re done. Send your work out into hard, cold world and wait for the kudos to come flooding in.
You probably already know “Kovner” (the man with no first first name). Kovner co-presented at a number of TCC sessions with me, and is one of the scary-smart young guns at Tableau - he’s also the guy who created the Tables in the East game.
Now he’s gone off and created TabChat, a proof of concept that allows two or more users to collaboratively explore, filter, and chat about a viz being displayed. It’s pretty self-explanatory, but you can click “How to use this” if you get lost.
User 1 logs in:
User 2 Logs in:
So, jump out here, play with TabChat, then bookmark Kovner’s blog and return often.
Over the last few months I’ve seen a few folks having problems connecting Tableau and PowerPivot data sources.
Occasionally, the person wasn’t able to connect to a workbook on their local machine. But more often than not, the challenge was connecting to a workbook saved in the PowerPivot Gallery.
Being a masochist, I decided it might be fun to install the Microsoft BI stack on a VM in my domain: It’s been a few years since I’ve felt this particular type of pain, and I was itiching for a good thrashing. In the immortal words of Eric Stratton:
“…In this case, I think we have to go all out. I think that this situation absolutely requires a really futile and stupid gesture be done on somebody’s part”
So, three hours later I have a single node of SharePoint 2013 up and running with Excel Services and the SQL Server PowerPivot System Service chugging along. Hurts so good!
I also went ahead and updated Office (x86) from 2010 to 2013 - I had been running some old add-ins that finally got updated and now work in 2013, so why not go all in.
Off to the Tableau Website where this handy-dandy article told me which version of the add-in to download and install with Tableau 8.1:
Since I’m running 8.1 and like to run x86 Office, here’s what I need:
I dutifully download the correct Add-in from the Tableau site:
And now, I’m ready to test…
First, I find my PowerPivot file:
Next, connect to it:
Volia, one data model ready for consumption:
But that was supposed to be easy. How about connecting to the same workbook saved directly out in SharePoint?
Yay! An error:
"The path you specified is invalid"
"Analysis Services database error 0x80004005: Either a connection cannot be made to the SharePoint server, or Analysis Services is not running on the computer specified"
Aha! Look what those chumps did to us and what I did to myself with the cute file name!
Note the spaces in the name of the default Power Pivot document library that was created for me: “PowerPivot Gallery”.
You can even see that the space is being encoded with %20 in the browser.
Likewise, my trying-too-hard-to-be-funny file name has tons of spaces.
I need to reference this file like so:
…and it worked.
Now, the funny thing here is that I JUST tried the same thing again without encoding my spaces:
http://SharePoint2013/PowerPivot Gallery/I want my morning back.xlsx
And it continued to work. In fact, I can’t get it to fail now.
So, you may need to encode your spaces. Or not, I’m not sure. But I have no problems accessing this stuff from the local filesystem or via SharePoint itself.
Enough pain for one day, time to watch the Jersey Shore marathon.
Tableau can easily connect to #SQLAzure, but there’s one tiny catch - it appears we (Tableau) may need to read something in SQL’s master database during the connection process. Quite often your friendly DBA may not have explicitly added YOUR login as a user in that database.
For example, this blog entry shows how to add users to a database in SQL Azure:
…note how the user in question is only being added to the database they should have access to? Makes sense, as we don’t expect to have normal users spellunking around in master.
However, if you don’t also add the user to master, the following error will occur (For Google: “Invalid username and password”):
While your friends and loved ones may be highly amused as you enter a CORRECT username and password over and over again, begin turning red, and then dropping F-bombs, here’s the solution — get your admin to add you as a user to master. To be clear, you don’t even need to be added to a SQL Server Security role (like db_datareader) which enables to you to see stuff - you just need to be a user:
CREATE USER Tableau FROM LOGIN Tableau;
I’ve started writing my Great American Novel. Here are the first few passages:
"Do SAML and Trusted Tickets work together in Tableau 8.1+?" asked Timmy, the curiously mottled grasshopper.
"Yes." responded David, as he stretched his legs luxuriously under the table.
Cindy, seizing the opportunity, grabbed the last piece of bacon and continued to page through the New York Times.
Then, the zombies came.