Mind the Data Gap: Amending your Data for New Insights - Inspire 2017

Better insights begin with comprehensive datasets. Learn how to amend your datasets with our high-quality data that's updated regularly to discover new insight opportunities and gain a deeper understanding of your customer's characteristics. Hear use cases and learn how to combine internal data with third-party data from leading providers, such as Experian, Dun & Bradstreet and TomTom, to augment your analytic insights.



Video Transcription


Speaker:
Good afternoon, everyone. We're going to get started here in just a second. A bit of housekeeping if you guys don't mind, when we get to the Q&A portion, if you would kindly wait for me to run over to you with the microphone so everyone can enjoy the question in the room, that would be great. The other thing is we encourage you strongly to take the survey at the end of the session. Your feedback is important to us as we use it to shape our Inspire tech tracks for coming years. We want to make this for you, about you, so tell us what you need and we will make sure to provide that for you.

All right, with that I'd like to introduce Wendy.

Wendy Chow:
Thank you, thank you. Sorry, can you hear me? Okay.

Thanks everyone. So if you've attended other track sessions this slide should be familiar. So regarding about forward looking statements, making a decision point in terms of purchasing the data, so if you have any questions you can follow up with us. So let's continue. Okay.

So the session that we're talking about is Mind the Data Gap: Amending Your Data for New Insights. My name is Wendy Chow, I am the senior manager of data here at Alteryx. I've been with Alteryx for actually 12 years, and I look after the team that's responsible for delivering all of the data, spatial as well as the US and Canadian data with our designer license. So if you have any questions regarding about the data we have here, or any questions in general about any geographic, demographic, anything of that, feel free to kind of come over and talk to me.

So what we're going to be covering is essentially... Let me just take a quick poll; who is currently licensed or using the spatial data? Yay, okay. Data in general? US or, okay, smaller group, okay. Cool. So what we're going to be talking about is basically a little bit more in depth overview in terms of what's included in the US spatial, the US data. And the reason why I pick on US is that it's the most kind of comprehensive in terms of data. But where there is differences I'll certainly point them out.

We're going to talk about region coverage. If you attended my tech track session last year, which hopefully not a lot of you guys did, we mostly focus on US and Canada, but since then we've actually added a great number of countries that I wanted to touch upon. So something for you to think about or in the future. The reference material as well as how we're currently delivering our installs.

So I like to keep my sessions fairly informal so please don't hesitation to ask any questions along the way, don't wait until the very end. So it's kind of like, very informal.

So, let's get started. So what we're going to talk about the first one is the US spatial data. So this is really quickly starting over in the far left hand corner in terms of what's included in it is a geocoder. So I'm not going to read verbatim because obviously you guys can read it. But what you see here is the dataset on the far left hand side followed by how often that the data is updated as well as a short description. So those who are currently licensed for spatial as well as data, this should look familiar to you. So we have a geocoder.

What's new that we hadn't released last year was actually a reverse geocoder. So if you have a dataset with latitude longitude fields you can actually insert it. We have a macro and it can take care of that for you. In addition, specifically for the US, we have a Zip+4 coder, so for greater granularity. As well as address points, which is often referred to as rooftop or parcel centroids so if you're looking for greater accuracy, by all means please implement that particular dataset when you're geocoding.

Lastly, CASS, which is for address standardization. All of our datasets except for the reverse geocoder is a service, as well as CASS which is sent as an email and is bimonthly.

On the far right hand side, what you see here is... where's the pointer... it just basically illustrates the differences in the accuracy of geocoding. So the point that's in the middle, highlighted yellow, that's offset from the street is actually kind of indicating about parcel level. Whereas the one on the far right, extreme far right, highlighted yellow is a street base. Below that is just simply a table showing the output within Alteryx. And then on the bottom left are the tools that implement these particular datasets. So what's new for this particular release is the reverse geocoder, which is the fourth one from the left.

Okay. Rounding out additionally in the US spatial is Alteryx maps, Drivetime, and Digital Globe which is satellite imagery. What's changed from last year is that we are releasing Alteryx maps and Drivetime on a quarterly cadence when before it used to be semi-annual. And you can see here on the right hand side just basically an example of a Drivetime that was created following the street network.

And here's basically a illustration of the satellite. Just to let you know what's changed from last year: we're always making enhancements to our product offering, so if you use satellite imagery or the Digital Globe, one thing that we had changed was the street opacity. So before it used to be fairly solid white, we actually changed it so that it's not as glaring. And if there's any suggestions or recommendations, we're always looking for feedback, so by all means shoot me an email or come up and make any recommendations, we're always considering them. And on the left hand side are all the different tools that use these particular datasets.

Lastly, we include in the US spatial is the US census SF1. It includes, obviously, the 2010 census which is obviously released every 10 years. But what's different is for the geographies we update every quarter. So it matches to the ones that we deliver with Experian CAPE data, which we'll talk about in the next few minutes.

Okay so the US data. So specifically about that, we obviously touched upon the US spatial, which is included in the US data. As well as that, we have demographics which we'll talk about: Experian CAPE, Mosaic, and the Simmons Syndicated Survey, the ConsumerView file, the Dunn and Bradstreet, and the Kalibrate Traffic Counts.

So just really quickly, what exactly... do you guys use the Experian CAPE demographics quite a bit in your day to day? Okay. So you're probably familiar with... So the lowest level of geography in that particular data set, as you guys know, is at the Block Group level and we rolled up all the different geography levels, the census ones. In addition, we supplement it with ZIP codes, DMAs, as well as congressional districts, so any type of different geographies are included in it. So what is included in the file are current year estimates and five-year projections. There is a couple specialized datasets such as daytime population as well as seasonal. Obviously the 2010 census is included as well as consumer expenditure, retail demand, Mosaic, which is the segmentation system, and the Simmons Syndicated Survey Data. In the bottom are the tools that you would use, the demographic analysis tools to use within Alteryx.

Next is the Experian Consumerview file. So are there a lot of people that use the household file? Okay a few, okay. A lot of people are somewhat deterred in terms of using the Consumerview file. Basically it's a huge freaking dataset. It's about 120 gigs and it takes a while to install. But when you think of the fact that there's over 220 million individuals and 134 million households, so that kind of explains the reason why. But it's very effective in terms of that if you have a customer file obviously, and you want to append additional information to it, that is the value of it. And we'll kind of touch upon Mosaic, which I feel is one of the key stars in this particular dataset.

So the household fields include Mosaic Household and ZIP+4 which I mentioned. There is Mail order buyer preferences, Mortgage/home purchase, as well as Median housing value. There's about 25 individual fields, which you can see there includes those particular ones. Often times this file is used when you want to match an append to a customer list. So hopefully you would have some address information, name information, so you can run it through using the Cosumerview matching macro, which is the macro to the far right, the little purple one with the house, and actually append any of the attributes that are licensed within the US data license.

If you had the opportunity to look through your variable list, you may see that there are other fields that are actually marked as "No." Those are fields that actually are included in the actual file, but requires a special license to access it. So if you are interested, by all means come up to us. You can speak to someone in solution center or actually even over at Experian to kind of discuss about them. So there's quite some cool and specialized data sets.

In addition to using a consumer view matching macro, obviously this is a Calgary file, so what you can do is actually report out a counter list of households with certain characteristics. So say that you want to know the number of households that, are example, A01 or B01 or something of that scope.

So one of the things I want to touch upon is actually the Experian Mosaic system. So what this is is a lifestyle segmentation. It's a classification way of identifying neighborhoods and it kind of goes along that theory that birds of a feather flock together. So people who, or households that have very common buying behaviors, demographics, households, characteristics are most likely going to be concentrated in one area.

So in this particular case, Experian has 71 unique segments that are basically grouped into 19. The Experian Mosaic is available in both Experian CAPE data, which is at block group level, as well as the Consumerview, only in the household file. It is not in the individual file.

In addition to the Experian CAPE, what you see is what I refer to as a residential. Experian actually has developed a really cool segmentation system which, by far, I have not seen elsewhere, it's called Mosaic Workplace. It's based on where people work as opposed to where people live. So if you're interested in daytime trends, or traffic, or anything that is looking at daytime population, highly recommend to use Mosaic Workplace. What's even better about Mosaic Workplace, it uses the exact same segment names and group descriptions. So you don't actually have to memorize or be familiar with a whole new set of different groups or descriptions.

One item that I wanted to touch upon is that if you are an organization that's looking outside of the US and you're looking for a segmentation system finding like, targeting certain households, Experian actually has a product called Global Mosaic. So you can see there, it classifies over 2,000,000 people worldwide. And there's 10 groups, it doesn't go down to segment level. But what's interesting about it is that it translate across all of the countries that Experian supports Mosaic, which is about 25 different countries. So that's something to kind of consider if you're looking for characteristics of a certain target group.

Crowd question:
[inaudible 00:12:39]

Wendy Chow:
Yes. It's actually in the Consumerview file. So if you're interested that's something definitely that... So if anyone is interested in evaluating certain fields or looking at what other datasets that they are. You obviously know that you guys get a license key sent from Fulfillment, so what we would do is through working with Experian, we would actually issue out the new license key. You just need to remove, disregard your old one and then install the new one. And you would have access to those fields within the file. So it's very easy peasy, we're not shipping you a new drive or anything like that.

So quite simply, Experian is, excuse me, Mosaic is very effective for targeting, acquiring, and managing relationships in improving business resources.

What this slide basically identifies, and which I showed previously is that... So starting from the far left and moving to the right, we're kind of talking about targeting and how effective Mosaic can be used for that. It's persistent, as I mentioned, in both CAPE and Consumerview file. So in the two blue boxes, we have block group dominant and block group household distribution, which are both found in the CAPE dataset. The difference between the two, and you can see that highlighted in the green box, is that every block group, starting in the far left, every block group, every household has been identified or flagged with the same cluster code, in this case is E19. What you see below that are little bullet points in terms of when you would potentially use that particular dataset, when it's appropriate.

Moving down next to it is what we have the block group household distribution. Again, found in the CAPE data, the lowest geography at the block group level. Instead of assigning every household with that same cluster code, you'll see a distribution of that particular block group. So in this example we see that there's a little over 200 households that are flagged as E19 and an assortment of other different cluster codes. So it's great in terms of knowing that, okay this is better, I want to see a mix of different cluster codes. Maybe I've identified five or six different clusters and I want to rank, like ZIP codes or any other geographies and see what kind of bubbles up in your top three. Which is great, but what it doesn't tell you is where exactly are they on the ground. So you want to do mailing, you want to kind of define a trade area, where are they.

Mosaic household, which is in the Consumerview file, is the dataset that you would use. Very similar to the block group house with distribution that... sorry the block group dominant in terms of assigning one cluster code, but instead of obviously at the block group level, this is at the household level. So it's fantastic in terms of obviously profiling your customers, because you certainly don't expect all your customers to behave the same way. As well as doing market potential. And if you want to do a mailing or anything like that.

So on the right, so as I mentioned before, the two left argues the demographic analysis tools, and Mosaic is in the Consumerview file, which uses a [calibray 00:16:16] input tools. And the green is simply identifying when you want to do any targeting. Obviously, as you move closer on your right hand side it's more about micro-targeting.

So moving on to the Simmons Syndicated Survey Data. This is coming from the Simmons National Consumer study, and it covers over 60,000 data variables, including psychographics, lifestyle, attitudes, and opinions. And the data is available using, again, the demographic analysis tools. We have it available at adult population as well as household totals and percentages, as well as using the behavior analysis tools. So that can be used in conjunction with Mosaic if you want to do any target marketing to know what's the best effective way to reach out to your audience. Is it through certain magazines, certain programs, radio, their buying behaviors.

Where it's really useful for is that if you are organization that has kind of thought about, or have not had the energy or the resources to collect customer data, so you can use it as a proxy for that. As well as you can see there are new business developments and communication plans for brand targets.

So rounding out the data, we have the Dunn and Bradstreet business location file. So we have it in both US as well as Canada. You can see there the different list of variables we have, there's over 100 of them. We collect the business name, the SIC codes and the NAICS codes, Experian collects up to six, excuse me, numbers for each particular SIC and NAICS codes, their sales volume, and obviously a latitude and longitude. What is included in the file but not being returned, but if you have this particular data in your own file, is the DUNS number, contact names, and the street address, and phone numbers.

So complimenting the Dunn and Bradstreet business location file, we've developed a business summary file, which is basically a geosummarized D and B file, so we've taken all the D and B and we've aggregated at the block group level. So we provide totals and percents for employees and establishments. So highlighted in those two bullet points are examples of the different categories that we have for the business summary data.

And likewise, like the Consumerview data where it can be used, we developed a business matching macro, again on the top right hand corner. So if you have a list of business and you want to append certain information, particular SIC and NAICS codes, sales volume, et cetera, you would use that particular macro for that.

Lastly, we have the Kalibrate Technologies Traffic Counts. What we provide on the desktop is a sample, 15 percent sample, however this is used as a sample to kind of develop your work flows. Then what you can do is push them up to the gallery where the national file is based there. You can see the different applications: site selection as well as market analysis.

Okay now region coverage. So sorry, any questions about spatial or data? No? Okay.

So what we offer for spatial are these countries. So obviously we talked about the US in great depth. There's Canada, so on the top, UK, and the Republic of Ireland. Starting on the bottom left hand side is Australia and New Zealand, Brazil, and Europe. So Europe has about 25 countries and they're mostly focused on Western and Central Europe. So persistent through all of the, with the exception of CASS, there is a geocoder, reverse geocoder, Drivetime, Alteryx maps, and obviously Digital Globe for satellite imagery.

On the data side of things, obviously we talked about the US, and we also offer a Canada data offering. So you can see there that includes spatial plus those elements in that datasets.

Okay so what we have put together, and I don't know if a lot of people are aware of this, but in terms of the reference information, often times people have questions about where the source is of the consumer expenditure. Is it the Bureau of Labor statistics? How often is it updated? What's the time frame as to what it is? What vintage am I on? Is it like, Q1 or whatever? So I really point towards the reference material that we actually ship in the documentation folder. Included is obviously, you see the list of items that we include in each quarter. The ones I kind of really want to highlight is the top bullet point, which are the release notes, the variable list, and the change log. Unfortunately changes do take place. Fields get deprecated or added, so please take a look at that change log. As well as, we have information specific to certain datasets there.

In addition to putting the data onto the hard drive or the FDP, we also provide this information on the Data Products Knowledge Base. Is anyone using the Data Products Knowledge Base, or Community? Seriously? No, thanks. Okay I highly recommend that you guys go to the Community. It's a fantastic resource obviously if you use designer. But also where we actually keep the data on the Data Products Knowledge Base is where we keep actually the release notes. So if you misplace your external hard drive, or you shipped it, or you have no idea where it is, the Data Products Knowledge Base keeps it current as well as the previous, as well as any additional information specific to any of the datasets. Please note that it's only available to data subscribers, so you will be required to sign in with a username and password, which I believe is the same as the gallery. If you don't have access to that then you just need to ship off a note to the Community and they can set you up with one.

Lastly, what's really kind of changed... Not much has changed from last year, but what I just want to highlight is that we obviously ship out on an external hard drive, the US and Canada. But what we offer now is being able to download the data versus Amazon S3. So if your organization is interested in that option, by all means send a note to Fulfillment at alteryx.com with the details and they can set you up with that immediately.

All of the spatial installs are available via FDP, we no longer send them out on thumb drives. And CASS is the exception that, due to its update frequency, we send it out as a download link, and it comes from an email from Alteryx Products. If you aren't receiving that email, again send a note off the Fulfillment at alteryx.com and they can set you up with that.

So that concludes that. Are there any questions? Sorry, Lisa's just coming down.

Crowd question:
How are any of these vendors or data sources going to be doing compliance for GDPR next year?

Wendy Chow:
Good question. I can't speak for on behalf... No, okay. I'll have to kind of follow up with that. So if you could kind of come up and actually then I can get back to you with that. Thank you.

Crowd question:
On the D and B matches, we've noticed we have a difficult time getting suite numbers, address two field.

Wendy Chow:
Okay.

Crowd question:
Is that available or is that blocked for a particular reason?

Wendy Chow:
It shouldn't be blocked for a particular reason. But if you can come up and I can take a look at your work flow, and then we could sort it out, then that would be great.

Crowd question:
Hi. My question is regarding the region. I saw that [Bresue 00:25:00] had the spatial available for Brazil.

Wendy Chow:
Yes.

Crowd question:
In terms of data, you know, consumer behavior that kind of thing, I know it's not supported as of right now-

Wendy Chow:
Correct.

Crowd question:
Current. Are there alternatives that we can go to have that kind of customer behavior data for Brazil?

Wendy Chow:
Absolutely. It's not uncommon for people who are licensing mostly US, maybe some Canadian data that are looking to round out the datasets because they're familiar with certain things. We definitely work with Experian, that has actually demographics as well as segmentation data and consumer expenditure data. So I recommend kind of coming up and we can kind of put you with the right person to see what kind of data is available. Okay.

Crowd question:
Do you foresee in the future offering any type of commodity pricing data?

Wendy Chow:
Can you explain that a little bit further?

Crowd question:
So the price of a barrel of oil, historical prices so we-

Wendy Chow:
Oh, I see what you mean. Honestly, at this time I don't, but if you have a request like that, I usually have a number of resources that usually we'll reach out and see if they have that data, and I just simply make the introductions. So if you want to come or send me an email, then I can kind of find that for you.

Crowd question:
Commodities you can get through Q and L and they have a [Jazon 00:26:23] rate you can just use the Jazon connector to put it in and get live data.

Wendy Chow:
Thank you.

Crowd question:
Is historical data available?

Wendy Chow:
For those particular datasets or for-

Crowd question:
For any of the Mosaic data? You know we subscribed about a year ago, we'd like to do some longitudinal looks at data. Can you back order datasets?

Wendy Chow:
It actually has kind of come up so if you'd like to come up and talk to me about it, we can definitely discuss it.

Crowd question:
Okay thanks.

Wendy Chow:
No worries.

Speaker:
Any other questions? Okay great. Just a reminder please take your surveys, we do value your feedback. We want to make this as meaningful as possible when you do visit us at Inspire. With that, that's the end of the track, so we will see you all tonight at Brooklyn Bowl.

Wendy Chow:
Thank you.

^Top

Experience the
Power of Alteryx
For Yourself.

Get Started