Predictive Wrappers Delight - Inspire 2017

Alteryx offers a little-known secret when it comes to using code for advanced analytic models and tasks. This session is for the data scientist wanting to incorporate predictive coding into his/her usage of Alteryx.



Video Transcription


Neil Ryan:
Welcome to Predictive Wrappers Delight. We're or going to talk to you about coding in Alteryx especially for data science. I'm Neil Ryan, I'm a product manager at Alteryx. I'm responsible for our advanced analytics functionality. Been with Alteryx for about three years now, part of that about a decade of building models for banks, and insurance companies, and government agencies.

I'm excited to introduce Colin Ristig. He is a brand new colleague of mine coming over from Yhat, which you heard about this morning. Tell us a little bit about yourself.

Colin Ristig:
Hey all. I'm part of the Yhat team that's now part of the Alteryx family. I worked on the product that you saw it briefly demoed this morning by Greg. I've been there about three years, so we'll be covering that in more depth today, so you'll know more about what we do hopefully by the end of his presentation.

Neil Ryan:
Cool, thanks Colin. Okay, so before I get into it. This is the slide the lawyers stick in at the last minute. What you hear today may contain forward actually will contain forward, actually it will contain forward looking statements, subjects to risks and uncertainty. Please don't make any purchasing decisions based on what you hear today.

Getting into it, if you use Alteryx you know that Alteryx can be code free. It's a nice user interface. You can drag and drop tools. Point and click to configure them. You can get analytics without having to write a single line of code, and that's great. That's one of the things people love about Alteryx, but you know what's Alteryx is less known for is that it actually is quite code friendly, so ever since really the beginning of Alteryx, ten years ago we've had our Alteryx SDK where you can write new tools for Alteryx in C++ and C#. Just the way our development team does.

Ad then a few years ago, four or five years ago we embedded R in Alteryx and you could take arbitrary R scripts that you're using, your customer R code and run it just seamlessly embedded in an Alteryx workflow. So we've been code friendly since the beginning really, but we're getting code friendlier you heard about Python this morning, you heard about spark this morning, so you can submit Python or Scala or R code, and have it execute on a spark cluster. We're getting friendlier all the time.

Now I mentioned you know we are code free in a sense, so what are thee use cases? Why do you want to put your code into Alteryx? Well first of all, if you have an installed tool that's a predictive tool based on R for instance, they our linear regression tool, maybe it doesn't act quite like you want it to. Maybe you want to customize it a little bit. With our predictive tools you can always open them up, see the R code underneath and tweak it, and save a customized version of that tool to suit your needs.

What if we just don't have a feature that you need. We're not using an algorithm that you need to get the job done. Well you can drop an R tool into your workflow and extend the functionality of Alteryx to your heart's content. So that's extending a workflow. Now you can turn that embedded R code in your workflow into a tool. You can wrap it in a macro, an Alteryx macro, and share it with the rest of your organization, so that what you've done to extend your workflow is accessible to everybody else that your organization. Can extend Alteryx yourself.

Now there might be some hard core data scientist in the crowd here. Maybe you're wondering... All right. I love our studio. I'm in it all day every day. I'm comfortable there. I don't need drag-and-drop. Why would I need to put in my code into Alteryx? Well, especially on the data prep and blending side, the Alteryx engine is really, really fast for joining data, parsing data. If you can do it and Alteryx you probably should be using or native C++ written tools because they'll often be faster then R or Python.

When you do embed your code into Alteryx workflows, you know and Alteryx workflow in and of itself is essentially a repeatable process. You can change the input data to the latest data and just rerun the workflow. Everything is up to date, you just automated it. Take it a step further with Alteryx server. You can publish your workflows to the Alteryx server. Schedule them to run on a repeated basis. Take advantage of all that Alteryx has to offer in terms of automation with your code.

And finally, it's about using your code and spreading it throughout your organization, so that as many people in your business can take advantage of it as possible. So I'm talking about disseminating your skills and your knowledge throughout your organization. And what I mean by that, I mean a couple things. There's the intra organization dissemination, so this is a typical org chart we see with a lot of our customers. You saw at Dean's keynote yesterday, Alan Jacobson from Ford came up. He works with the CEO at Ford, and he runs a huge analytics team comprised of data scientists, and BI experts, and they work together seamlessly.

So the BI experts are using Alteryx to to get data prep and blended together, build dashboards in Tableau, and Clicks. It's kind of funny that use both Tableau and Click, and they're sharing that in Alteryx workflows with their data scientists. Data scientist are building predictive models in R, and Python, and spark, and they're sharing those models through Alteryx backs with their BI experts. And then inner org as well. So that central analytics team is building best-of-breed workflows and dashboards, and sharing them with other departments throughout the organization. Operations, marketing, finance, HR.

These departments use Alteryx as well, they're just not the hardcore super users. So Ford has over a thousand people using Alteryx throughout an organization not just the central analytics team. And then same thing with the data scientists. So this is where when you get your code into these Alteryx workflows you can share it now with... People aren't going to install our studio in marketing necessarily, so now the people in marketing who have Alteryx can benefit from these models that you built, these custom predictive models that you coded up and put into Alteryx workflows. So that's what I'm talking about when I'm talking about dissemination.

Today we're going to talk about a little bit about R. how to get your R code into Alteryx. I'm going to move on I'm going to show you sneak preview of new Python SDK functionality. Quick look at our new custom spark tool, and then I'm going to turn it over to Colin. He's going to talk about deploying managing models.

Starting with R. There's two ways to get started with getting your R code into Alteryx. So the first is starting with one of our tools that we've already built. So almost all of the predictive tools you can just right click on them, when you right click on the tool brings up this dialogue, and you can open the macro. When you open the macro you can see it's just some simple Alteryx workflow that we've built, and a lot of them look like this where at the top you've got your interface tools that build up the interface, the user interface to configure the tool in Alteryx, so if you want to customize this tool. You want to customize the interface, this is where you want to modify these interface tools at the top of the workflow.

This is where the data comes out. So if maybe you want to add another output anchor, so you can get another table that we're not producing you can add another macro output tool, and modify the underlying workload to push out the data you're interested in. And then always in our predictive tools, there's an R tool somewhere in the middle here, and you click on the R tool, the code editor pops up on the left side of the screen, and you can edit it to your heart's content. Customize it to suit your needs.

So just an example I found on the community of somebody doing exactly this. Somebody posted on the community, "Hey, the linear regression tool has the Anova metrics, but it's not exactly the metrics I'm looking for. Can anybody help me out with this?" Somebody nicely replied, and posted a screen shot on the community zooming into the exact line of code and put a nice red box around it and posted it back and said, "This is what you need to edit to customize this R script, to customize the linear regression tool, to suit what you want."

Second thing to get your R code into Alteryx workflow starting from scratch, so you just drag down the R tools in the developer category, and you get the code editor in the configuration panel, just simply copy and paste your code in there, and then we provide some R functions specific to interacting with Alteryx. And they're available just in this little drop down within Alteryx. So we have functions, for instance, to pull data in as a data frame from the Alteryx workflow. So whatever data you have flowing into your R tool is now available as a data frame, so that you can manipulate the data and then we have another function to call in R to push the data back out into Alteryx.

You can also write messages to logs, which is great if you're going to eventually wrap this up into a macro and create your own tool. So you can make it send error messages to the log if somebody's configured it, things like that. And you can also create plots and graphs in R, and push them out of the other side, and you can manipulate those with our reporting tools in Alteryx.

So really a simple example of somebody doing this from scratch, again from the community, so somebody was trying to deal with SPSS files with Alteryx, and we can read in SPSS files with our input data tool in Alteryx. These are the data SPSS data files, so they're .SAV files, but they have a concept in SPSS called value label, so say you have survey data and the responses to the survey are A, B, C or D. So they can be encoded as A, B, C or D, but then the actual answers are stored as value labels. And our input data tool wasn't pulling in those valuables like somebody wanted, so somebody used the R tool, found an R package called foreign, and with two additional lines of code besides importing that foreign package, they will be able to read in these SPSS data files with the value labels, so they essentially extended Alteryx's functionality with three lines of R code, and posted that back to the community. They were off. And gave us a little time to fix our input data tool, and add that functionality.

So that's how you get started. Here's some additional resources if you're so inclined. There's a couple open source packages that we put together and put on GitHub that you might find useful. Jeeves is a general utility for taking scripts that you've produced outside of R, say in our studio, and then bootstrapping those to quickly get them into an Alteryx workflow.

Another one is Flight Deck and this helps produce a beautiful interactive visualizations. In fact, this is one we used in our 11 refresh of the linear regression, logistic regression, and decision tools. We added nice interactive reports coming out of those tools. We use Flight Deck, and then we open source this Fight Deck package that you can use it yourself.

Finally there's a series of knowledge based posts on community, called Guide to Creating Your Own R-Based Macro's. Dr. Dan [Putler 00:13:31] created this knowledge base series, and it takes two step by step with a use case of, you've got some R code, how do you get it into Alteryx, and turn it into an actual Alteryx tool. So I encourage you to check that out. Talks about what I've talked about today, but in much more detail. It's like a nice tutorial to get you started.

Python. Who here uses Python? All right, about half of you. Cool, so this should be exciting for you guys. I think we've already heard this in a couple keynotes, so I'm not going to dwell on it, but Python is really just as popular as R. In that recent Katie Nuggets bullet just passed R in popularity for the first time, so we figured it was time to support both of them. Why make you choose between one or the other when you should be able to do both in Alteryx.

So we're tackling Python a little bit differently than we tackled R. So with R, as you know, it's in an actual tool that you drop onto the canvas, and put your R code in and it runs just straight through the canvas in your workflow. With Python we're starting differently. We're going the SDK route. So in this case there's... To start there's not going to be any Python tool that you just drop in. You're going to create a new tool all in the back-end, all through code in the back end, and when this comes out later this year there's going to be good documentation to tell you exactly how to do that, but the reason we're doing it as an SDK to start is because it's really going to be much more performant that way.

You as a Python developer will have complete control over the records as they flow in and out of the tool that you're building for Alteryx. So you'll be able to interact really at a much lower level with the Alteryx engine, and it's going to be super fast that way. So basically the way you go about doing this, you create a new folder within the Alteryx install, you name the folder what you want your tool name to be, and then there's four files in that folder that you're expected to populate. An icon file, so what's it's going to render as in Alteryx. An engine file, that's your Python script. A GUI file, the nice HTML interface, and then the configuration file to tie all these things together.

So the engine file, this is your Python script, it's going to expect you to write certain functions, and these functions are going to get called by the engine, the Alteryx engine, at various points in the workflow lifecycle. One function would get called, for instance, when you click on the tool for the first time. One function will get called when you click run et cetera. So let me... Well I'll show you one more slide and then I'll just show you a quick demo.

So what you saw during the demo this morning, Gary mentioned we use the SpaCy packages, a really awesome text analytics package. So we built this up just in the last couple weeks for this demo, but because we relied so heavily on this open source technology out there, we really didn't have to do that much work ourselves to get this into Alteryx. We were able to do the subject, verb, object, triple extraction that Gary demoed. He didn't demo, it's already in there, key phrase extraction, language detection, and general text cleansing.

So here's just a quick screenshot of the language detection, so you can see we got records with all different languages in there, you run it, tells you what language is. Let me just break out, I'll just show you a couple things that Gary didn't show you. So first of all, as I mentioned you create a new folder in the Alteryx install. In this case we're creating a folder called PyDate, so this a tool that my colleague JP picked up in a couple hours because he just wanted something really simple to show.

So if we open up the Python script. I already actually have it open, I zoomed in ... So a lot of this code... This is all the Python script, it's about 142 lines of code, but most of it is boilerplate. So when we officially release this with documentation, we're going to give you this boilerplate code that then you need to fill in certain spots. The part that you really need to fill in here, is just this one line. So this is a tool that JP built a due date cleansing, and he used this package of this Python package called the date parser, and so with this date parser package all he had to do was this one line where he calls the parse date function, and in Alteryx... It's pretty cool. It's a really simple tool where you just pick the field that you want to parse, you run this, it only takes a couple seconds, and then you got this nice clean date, and we did it in one line of code.

So pretty excited about the new Python functionality. Okay, let's move on to some other exciting stuff. Forward, forward, forward, forward, forward, forward. So many slides. What comes after Python? Spark, Spark. Okay, so most of you probably heard of Spark. It's basically a very modern data processing engine that usually runs on top of a [Hadoop 00:19:35], so it's optimized for cluster computing for massively scalable clusters of commodity hardware, and it's fast. So people been using Hadoop for a while now, and with Hadoop traditionally you had to use this technology called MapReduce to do your data processing, and with this tiny, tiny graph over here you can see that Spark is just way faster. So it's pretty much, people don't really use MapReduce anymore it's all about Spark on the Hadoop cluster when it comes to data processing and advanced analytics.

So for a little while now we've actually supported Spark with our NDB tools for kind of the basic data prep and blending. So this is a workflow that would run today with the Alteryx you have installed. This is reaching out to a data frame on Spark. It's doing filters, and formulas, joins, summarizations. All code free. What we've been working on, and what Gary demoed today, is the custom Spark tool. So this is where you... Like with R tool where you put arbitrary code into the code editor. Instead of running it in memory in R on your desktop or server or in Python on your desktop and server, it takes that Python, R, or Scala code and submits it as a job into the Spark cluster, and it works seamlessly with all our other data blending tools. So that now you don't just prep and blend your data with our NDB tools you can also train models, train ML models.

So the way you go about doing this is when you set up your NBD connection, there's going to be a new option instead of Spark ODBC, which is what we had before. We now have Spark Direct, and then you get to choose the connection mode, Scala R or Python, and then you drop in your custom Spark tool, and you can write your code in Python, Scala or R. You can leverage open source libraries like MLlib or H20 Sparkling Water to do your machine learning. This is an example one of our data scientist whipped up when we were first creating the tool. The tool looks a lot different now, but she trained a linear regression model using the MLlib package in eight lines of code, and it's fast.

So we did some analysis. We trained a model just using our traditional R base tools. So we used our boosted model tool on a decently sized data set, 670 mb. Our data scientist machine was pretty beefy. She had through 32 gigs of RAM and eight processors. Doing this in Alteryx with R based tools it took two hours. Then she did the same thing in our very modest Spark cluster, probably the kind of saddest Spark cluster you'll ever see. It's got four nodes, each node had 16 gigs of RAM. Even with that modest Spark cluster using a comparable package from H2O Sparkling Water that also does boosted models, ran it in three minutes. So it's fast.

I can just quickly show you what this school looks like in real life. Gary already showed it a little bit, but basically this is that workflow that ran in the Spark cluster in three minutes, and you can see in this case instead of MLlib we're using the H2O Sparkling Water package and it's training a boosting estimator model. And so it just fits seamlessly. She had to do some data prep in Alteryx to get the data ready for training a machine learning algorithm. But not too many tools, not too many lines of code, pretty easy to use. So we're excited that this is going to be coming out soon as well. Colin.

Colin Ristig:
Thanks Neil. So kind of transitioning over to a new topic. I'm going to be talking about model deployment, again to pull from the keynote this morning, a lot of this comes from the Y hat acquisition that was announced earlier this week. So before we talk about model deployment, I want to get into the team of operationalizing models. Many of you are familiar with taking a workflow and delivering it to an external party, whether it's someone who works internally on your team or another team, and within that kind of concept of taking a deliverable and giving it to another party, we kind of break it down into three separate groups.

So we have reports first, which again from the keynote we saw the ability to create a static report, a dashboard something that you can generate as a PDF or a document, give it over to someone else from a different team that can incorporate insights that can pull from databases et cetera. The kind of second major topic is batch processing, so Alteryx server lets you do things on a schedule whether you want a job to run every morning, every single night. If you're doing lead scoring for a marketing team, and they need to see which of the leads from the previous week are the ones they should be reaching out to. You can schedule that work flow to occur within Alteryx server.

What we're going to be focusing on now is this kind of new topic, which is real-time scoring. So let's conceptualized the idea of a person arrives at an application, you want to embed within their user experience a machine learning algorithms decision in what that user sees as they go through your process. So to kind of ground that, think about Amazon's people who bought this also bought that. They're using a real time machine learning algorithm for every single user on their website to determine what contents they render when you arrive at your checkout page.

So again, going back to what is real-time scoring. What's really important is to draw the distinction between real-time and batch. Right here up on the screen you can see I have a relatively straightforward logistical regression training. Pulling in from a database we do some filtering initially, and then in this example we're scoring the model against existing data, and then writing it out to a file. So relatively straightforward workflow, but how do we go about addressing putting that logistical regression into an actual application? So maybe you've got to website you want to predict whether a user is going to check out or not, we have a Boolean decision. We need to embed this logistic regression into a site that can kind of do some decision-making in real time.

So the way we do that is we're going to use the Y hat module, again looking forward we're going to be developing a macro that you're able to drag into your workspace and take that logistic regression that we trained to previously, deploy it to science ops or Y hat product, and then from there we give you a dedicated rest API endpoint that can be exposed to any sort of application that anyone in your organization can be interfacing with. Whether it's a CRM system, a live dedicated website, and a back end services that say an underwriting team uses to asses the quality of loan applicants, something like that.

To ground this a little bit we're going to look at a specific example where we're going to deploy a credit risk model to Y hat and then I'll show you how we can in real time process loan applicant data for people who are applying for our hypothetical loan. Without getting too deep into the code here, you can see the snippet that I've included on the left. In the model .predict function we're basically declaring the logic that we want invoked every single time are API is called. So again, going back to our example, when someone arrives at our website and says, "I want to apply for a loan." The logic that's going to be invoked to decide whether we give them a loan or not is that model .predict function. So that's what's going to be sent over to Y hat.

That code will execute and then we'll get a result in the user interface. So with that we'll transition to... That code that I just showed you, I've already deployed that using an Alteryx workflow to Y hat, so that function is now stood up as an API endpoint, and can be integrated into any application. In this case I've integrated into this web app that we're looking at right here. So you can could imagine in a somewhat realistic scenario as a credit underwriter if I'm working in a company that's responsible for issuing credit. I can input the fields of someone who's maybe applying for a loan and take advantage of the machine learning model that we've deployed, and is integrated into this application.

So to kind of illustrate what that looks like in real time, when I click this button predict credit risk, we're sending this information in a post request to our trained algorithm that's hosted on Y hat. The response is going to be returned, and then here you can see we've scored the riskiness of the applicant, and all of that information is kind of contained within our model result.

I've also, because we're in Vegas, I've trained a recommender algorithm to predict beer recommendations. So if y'all have a beer that you want a recommendation for I'm happy to see if it's in our list. We'll start with 90 minute IPA, this is one of my go to's. But again, I've deployed a predictive model using that same logic that we have an R function. There's a model that's hosted. We have an API endpoint that's up and dedicated, and when I click go we're sending this 90 minute IPA as a post request to our model. It's returning back in real time, and here you can see we've got you know ten other drinks that we should consider purchasing the next time we're at a bar.

Just another kind of use case. So addressing... Kind of thinking more holistically, why does really time scoring actually matter? If you think about the companies like Netflix, their movie recommender is heralded as one of the distinguishing features of their company. They estimate its worth is in the millions of dollars. Why is that? It's because users who arrive at their page are directly receiving the benefits of all of the machine learning and analytics that exist within Netflix's organization.

Same thing goes with Amazon. You have your shopping cart experience. There's a whole host of other examples to supply chain optimizations. People who are able to make machine learning algorithms have the most impact in their organization because it's closest to the people who are making decisions with the results of those models. So just kind of wrapping up here, Neil showed you Alteryx now supports running custom code whether it's R or Python in the platform.

Spark is now supported as well, so you do Hadoop style distributed data aggregations, write custom code within that as well. I briefly showed you the deployment of predictive models that can be embedded in real-time applications, and then model management is also part of that how do we update our models that are integrated into production applications? All of that is kind of encompassed within the model deployment that I showed you just a moment ago, but again the objective here is we basically now allow analysts, citizen data scientists, and data scientists who are capable of writing incredibly complex models, taking those algorithms, and then basically exposing them whether it's through a report. Could be a batch process or ultimately a real-time model, and kind of distributing and disseminating that knowledge out to the rest of your organizations.

So with that happy to open it up for questions now. I think Lisa is going to be bringing microphones around. So both Neil and I can answer the questions if you all have any.

Crowd question:
Hi. On the development of the Python connecting, are you guys going to build in specific ways to do some pre and post things like you've got some overhead that you have to instantiate some objects so that you don't have to run those for every single row of objective data?

Neil Ryan:
Yeah, so that's a good question. That's one of the reason we're doing this as an SDK because so a developer you have the option... So I mentioned you can do it row by row, you don't have to do it row by row, but it gives you that fine grained control so you're you're basically operating at the engine level with the SDK. So you will have control over that. I didn't mention longer-term, we do want to bring in Python as a tool the same way we have R as a tool because that has its own benefits.

While it might not be quite as performant, it's a little easier to use. You can just drop it in and write a couple lines of Python code without having to worry about the boiler plate and the setup of the tool. So we want to get there too, but we're starting with SDK in this case.

Crowd question:
Is the Y hat web page part of our Alteryx subscription or is extra associated with putting models up on the Y hat website?

Neil Ryan:
That's to be determined.

Crowd question:
My questions is kind of... I'm thinking more from a manager perspective, so Alteryx respectfully I mean it's meant for business people right? Drag and drop that's the whole stuff, but now we are moving up the stack and you still saying citizen data scientist and people should be able to code. You just said as a developer, so the diagram that you showed about how the arc chart is laid out, your CDO and your business analyst, and scientists. Are you starting to think do you... As a company maybe you're not the right person to ask, don't you think that requires more IT skills now?

Neil Ryan:
More IT skills. I think... To kind of address of the first part of your question, you're right we're never going to work never going to forget about the analyst. We're never going to push the analysts to code. This session was for the data scientist who wants to bring their knowledge to the rest of their organization to the analysts because we are seeing more and more these combined teams of both the analyst and the data scientists, and they want to work more closely together to get their back and forth faster.

Now in terms of getting the IT support, so IT's never going to go away, but I think IT is starting to... Their role is starting to change more from, I might be stealing a line from Dean here, but from protecting the data to liberating the data. We're never going to forget about IT. In my experience, in terms of for instance Spark clusters, a lot of the companies I've talked to the Spark clusters are actually not managed by IT. They're managed by the analytics organization. That the IT has been hanging on to the legacy databases SQL server, Oracle stuff like that, but Spark clusters are often kind of on their own from what I've seen.

Crowd question:
Couple questions they're a little bit related. One with regard to the implementing things from R, do you know if anybody has made some efforts or had some success, and I know this on the more complicated side, implementing modeling with Caret into Alteryx? And the second related question is, is there any sort of... Are there plans or time frames to within in the Spark system be building and providing the graphical tools to build models in Spark or that something that after that's released we're going to have to pick our favorite models and build those in that same way?

Neil Ryan:
Sure, so your first question Caret, so for those who don't know and correct me if I'm wrong, Caret's an R package that allows you to kind of use the same code interface to interact with a lot of other machine learning packages in R, right?

Crowd question:
[inaudible 00:37:54].

Neil Ryan:
Yeah, so we have taken a look at Caret actually, it's a really cool package. We had a data scientist who started to look at Caret and liked it a lot, and then he actually saw something similar I think it's called ML something, maybe find me later and I'll figure out what it was, but he actually liked that one a little more. So there's a couple options there. We are looking at technology like that and considering kind of building our own in terms of having a common interface to be able to build a lot of different type of algorithms instead of having to choose you know by dragging down linear regression decision tree forest model. We are looking into that.

Second question, a GY for Spark modeling right? Yeah, so what you saw today with the custom Spark tool it is the beginning, and just like the R tool, you can drag an R tool down, put custom code in, and then build that rapid in an Alteryx macro, and that essentially turns it into a more easy to use modeling algorithm for the analyst and the citizen data scientist. That's how we plan to use this custom Spark tool, so that we can put our own custom code in there to build various types of models, and then wrap it up into an Alteryx macro so it's easier to use for people who don't code in Python, or Scala for Spark. Does that answer your question?

But the community, you, will be able to do that as well.

Crowd question:
I think some of these developments are really exciting, when will we be able to get our hands on some of these tools, especially the Spark and database, and Python case.

Neil Ryan:
You can come find me after this, we're... Soon. Later this year we're going to be doing what we're calling a technical preview for the custom Spark tool functionality where we have some select customers kind of alpha testing it. So if you're interested in helping us out with that, I'd love to get your contact information.

Crowd question:
Hey there. I'm an Anaconda user and I know, and correct me I'm wrong here, but with the R tools you're dealing with a whole separate install of the R engine and you have to maintain packages separately and everything else. Is that also the model with Python going on down the road or would you use your own native kind of default engine?

Neil Ryan:
We don't want it to work that way. We want it to be embedded with Alteryx designer, so it's easier. We think we can do that we haven't flushed out all the details yet. One of the reasons we do that with R is because of the GPL license with R. You're not really allowed to kind of embed it, that's why we have it as that separate free install. Python doesn't have that restriction. Anaconda has a BSD license, which is really permissive. So we think we're going to be able to that. In my install that I showed you earlier today, we have miniconda embedded in Alteryx designer. That's what we have now, not sure exactly what we're going to release to market, but that's the direction that we're headed. Do you have a question sir?

Crowd question:
[inaudible 00:41:26]. This morning they mentioned .Net and... .Net environment to build stuff is fairly complicated and there's a lot of tools [inaudible 00:41:42] studio and what not. And that kind of thing, this may relate to a bit to R and Python also, but is it really just a sort of linear text data [inaudible 00:41:55] or is there more and how in .Net kind of an environment do you have to go set up a bunch of libraries specially and whatnot? I mean is much more complex.

Neil Ryan:
So, like I said in the presentation, we've had a C# .Net and a C++ SDK since pretty much when the Alteryx engine was built ten years ago, but you're right those are harder to use languages. You need Visual Studio or something like that, and you need to compile your code, and then put your DLL's that you've compiled into the Alteryx install to get all that working. One thing that we're really excited about with the Python SDK, even more than just data science, is it's a lot easier to use than C# and .Net, and C++, and you don't have to compile the code. It is as simple as what I showed you today where you have a folder with four required files, and you just drop your Python script into the folder, and it'll work. So we're excited about how easy to use this SDK is going to be.

Crowd question:
So right now we deploy our models onto Alteryx server, I don't have to write the code for getting the inputs. I was wondering if you guys might be able to compare and contrast how easy it's going to be to do it in Y hat versus doing it now like in Alteryx server.

Neil Ryan:
Yeah.

Colin Ristig:
That's a good question. So one of the capabilities of the integration with Y hat is the ability to send a post request, which is effectively a language-agnostic way to make a network request. The goal being to be able to take a single predictive routine or whatever it is you've decided to do and you know have it be spread throughout your organization. That would likely take like an engineering integration at some level. There's likely plans to have it integrated into other Alteryx tools so you could have interfaces or things like that where you wouldn't necessarily need to write the post request.

Neil Ryan:
Yeah, so I'll just add Alteryx server already comes with a pretty easy to use rest API. So with a little work you could get the same similar functionality with Alteryx server right now. The things that I'm really excited about with Y hat are that A, it works. It's kind of language agnostic, you can do this with R and Python, and you can do it completely outside of Alteryx. You don't actually need Alteryx designer to deploy these. The other thing is just, it's a beast. So if you're going to deploy an Alteryx predictive model using R with Alteryx server right now, the way our R tool is architected it actually... When you run a workflow, I don't know if you ever noticed this, but when you get to the R tool it actually takes a couple seconds to actually kick off that R process, so if you're talking about using an R tool to score, it's not going to get that real time performance that you're after with Alteryx server because it's going to take at least two seconds, and that's not going to cut it for powering websites where you're expecting it to load instantaneously.

So I'm really excited about what Y hat brings to the table in terms of this real, real, real time capability.

Speaker:
Thank you guys, we are at time. If you do have questions you may find these guys up in the Solution Center or you can come up and ask a few questions as we're changing rooms.

Neil Ryan:
Oh, and don't forget to do the survey.

Speaker:
Yes, please do the survey. Inspire's about you and for you, so we want to make sure it resonates for you.

^Top

ERLEBEN SIE
DIE WELT VON ALTERYX
SELBST.

Los Gehts