Analytics

News, events, thought leadership and more.
AJacobson
Alteryx
Alteryx

blog-banner.png

By Alan Jacobson & Max Kanter


We are incredibly excited to welcome the Feature Labs team to the Alteryx team and wanted to share a bit about how we arrived here, and what’s in store for you next.

 

What is Feature Engineering?

 

First, before we go too far into all the reasons why we are excited, let’s talk a bit about what Feature Engineering is all about. Dr. Jason Brownlee, an expert in the artificial intelligence (AI) and machine learning (ML) world is quoted in saying, “Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data”. This process includes doing things like determining if a date is a weekend or weekday or computing the peak value across a set of values. These features may be the key element in allowing one to predict something like when a customer will buy a product. In addition, data augmentation can be leveraged as part of feature engineering to further enhance the feature set. For example, if you have the zip codes of customers, and can leverage census data, you can see if the average income in that zip code is a strong indicator of who a customer might be.

 

Why is Feature Engineering Important?

 

The art of feature engineering is incredibly important in both understanding data sets, as well as in creating models. Why are sales fluctuating on different dates?  Is it because some days are holidays, weekends or some other factor?  So, how important is feature engineering to analyze and ultimately get positive outcomes with ML? Surveys of AI and ML experts suggest it is the MOST important factor on successful outcomes, as shown in this survey from Kaggle. 

 

Kaggle Survey

Ranking the Importance of Various Parameters on the

Outcome of Machine Learning Models

kaggle survey.png

 

Data Scientists can manually create these features leveraging Alteryx, but to automatically explore complex data sets and add these features quickly not only saves time, but can ultimately be the difference between successfully seeing the pattern in the data or failing to get the business outcome desired.

 

Andrew Ng described the challenge of feature engineering in this way: “Coming up with features is difficult, time-consuming and requires expert knowledge. ‘Applied machine learning’ is basically feature engineering.”  As the co-founder of Google Brain, former chief scientist at Baidu, professor at Stanford University and director of AI Lab, and co-founder of deep-learning.ai, I believe his point of view is coming from a great body of experience. 

 

Automating the Feature Engineering task will not only speed up the process and increase success but will help the modern-day data worker upskill and become more capable at machine learning.  Putting tools like this in the hands of the workforce will ultimately accelerate the digital transformation of the enterprise.

 

Who is Feature Labs?

 

Born out of Massachusetts Institute of Technology’s (MIT) Computer Science and AI Lab, Feature Labs quickly developed world-class, automated feature engineering capabilities, which eliminate the otherwise time-consuming, manual processes for data scientists. Alteryx plans to leverage these technologies going forward to expand our best-in-class code-free and code-friendly modeling and assisted modeling capabilities. 

 

Feedback on the Feature Labs work has been strong, with great success from a broad array of users, and featuretools is especially popular in hackathons and ML competitions in the data science world.  Here is a window into what users say:

 

“Machine learning is rapidly moving from manually designed models to automated data pipelines using tools such as auto-sklearn, MLbox, and TPOT. In using these automated tools, the aim is to simplify the model selection process and come up with the best data set features for our model.

 

Even with all the resources of a great machine learning expert, most of the gains come from the great features, not the algorithms used. Data scientists spend about 80% of their time on data cleaning and feature engineering, which is time consuming and requires domain knowledge and mathematical computation.

 

Featuretools is incredibly powerful in that it can create a feature matrix for any entity in our data. All we have to do is change the target-entity.

 

― Brian Mwangi (Medium) - Popular Data Scientist & Blogger

“Anyone who has participated in machine learning hackathons and competitions can attest to how crucial feature engineering can be. It is often the difference between getting into the top 10 of the leaderboard and finishing outside the top 50!

 

I have been a huge advocate of feature engineering but it can be a slow and arduous process when done manually… I have to spend time brainstorming over what features to come up, and analyze their usability them from different angles.

 

The featuretools package is truly a game-changer in machine learning. It has quickly become ultra-popular in hackathons and ML competitions. The amount of time it saves, and the usefulness of feature it generates, has truly won me over.

 

Analytics Vidhya - Popular Data Science Community Blog

“Feature Engineering for Machine Learning is critical however it can be a long arduous process… but with a python package called featuretools, a good chunk of time can be saved by automatically creating features for you.

 

I enjoyed the package and it helped me get into a better understanding of how to do much of my pre-processing from a conceptual basis...

 

If you know how to use featuretools effectively and have domain knowledge on how to ask it the right questions, you’ll get a lot out of it.

 

― Robert R.F. DeFilippi (Medium) - Popular Data Scientist & Blogger

 

Why Alteryx?

 

The goal of Feature Labs has always been to make data science easier for data scientists and business analysts alike, by providing automation and guidance through the process of creating machine learning models.  The Alteryx Platform fits this mission perfectly, with an aligned goal to help the modern-day data worker reach more successful outcomes. Moreover, the Alteryx Platform is already enabling enormous digital transformations at some of the world’s largest companies and gives Feature Labs an incredible mechanism to launch modern-day feature engineering and automated modeling capabilities. The combination of these technologies has the potential to unleash incredible capabilities for current and future customers.

 

What Does this Mean for Data Workers?

 

Our goal is to leverage the talents and skills of the existing team and to grow a data science innovation center that leverages the MIT roots and Boston location of Feature Labs to turbo-charge our efforts in this space. We plan to use this to expand our AI and ML efforts in both the Open Source data science community, as well as for line of business analysts that desire code-free tools that can guide them through the complex process to successfully implement AI and ML techniques with their domain knowledge.

 

We are excited to welcome the Feature Labs team to our incredible Alteryx team, and look forward to providing these expanded automated machine learning capabilities to our award-winning data science platform.

 

Let’s get the conversation started. So, what do you think?  How does feature engineering help you develop models and gain insights?  Have you leveraged feature engineering in your data science and analytic work? We look forward to hearing your thoughts!

 

max-2.jpg
Max Kanter
CEO and co-founder of Feature Labs

Max is the CEO and co-founder of Feature Labs where he is responsible for product vision and development. Under his leadership, usage of Feature Labs software has grown from 0 to over 50,000 downloads per month. He is also a co-principal investigator on a multiyear DARPA program aimed at automating the application of machine learning. Before starting Feature Labs, he was a researcher in MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and held engineering roles at Twitter, The New York Times, Fitbit, and Hewlett Packard.

Alan Jacobson
Chief Data and Analytics Officer

Alan Jacobson is the chief data and analytics officer (CDAO) of Alteryx, driving key data initiatives and accelerating digital business transformation for the Alteryx global customer base.

Alan Jacobson is the chief data and analytics officer (CDAO) of Alteryx, driving key data initiatives and accelerating digital business transformation for the Alteryx global customer base.

Comments