We’re all searching for those “aha!” moments. Maybe it comes when you’re trying on that perfect outfit from an online personal shopping service or driving this morning’s GPS-optimized route through rush-hour traffic. Something comes together, a bunch of dots get connected, and, like magic, it all works out! You find the perfect match or, as if by instinct, navigate around traffic on a route you’ve never driven before.
The common thread between those two different moments? You’re using machine learning. Like an invisible web, these human-written algorithms impact the world around you in both obvious and obscure ways. As a data analyst, you might want to add the magic of machine learning to your analytic arsenal. You, too, could be answering new, interesting, and future-facing questions.
Machine learning is the iterative process a computer follows when it is asked by a human to identify patterns in a dataset given specific constraints.
In fact, machine learning can deliver benefits to every department in your company since it allows you to recognize patterns in vast assortments of data to predict possible outcomes, allowing leaders to plan and take action. From pattern recognition comes predictive use cases, like customer response modeling, demand and inventory forecasting, and many more applications that drive business performance.
Machine learning is part of a new employment dynamic, creating jobs that center around analytical work augmented by artificial intelligence (AI). In the process, 2.3 million new jobs will be created by 2020, according to Gartner — a net gain, even as 1.8 million “old tech” jobs have been displaced.
A study released by the World Economic Forum shows that data-related jobs will be the most in demand within the next four to five years, along with AI and machine learning specialists. The job categories that will be the most in demand include data analysts and data scientists; AI and machine learning specialists; software and applications developers and analysts; and big data specialists, although it's likely these people will have other titles in the coming years.
Your company has a wealth of data that could be used for machine learning initiatives. But how do you get started? Let’s break it down.
Machine Learning Emerges
Machine learning is an increasingly popular way to estimate future outcomes in business. The consensus estimate is that it can outperform basic extrapolation methods by at least 20 percent.
The technique most commonly used in machine learning is “supervised learning,” according to Tom Davenport, Co-Founder of the International Institute for Analytics. At its most basic, supervised learning involves training an algorithm on known outcomes in past data, so it can then predict outcomes on future data. For example, detecting bank fraud could involve a model trained on data in which illicit activity has been clearly established. The model would then be used to identify potential illicit activity in new data.
A more recent trend, given the well-documented shortage of highly-skilled data scientists, is the mainstreaming of automated model-building software to help analysts construct useful machine-learning models without having high-level data science skills. Think of websites in the early internet years; they could only be built by someone who knew programming languages. Now anyone can build a website through a visual drag-and-drop interface. You still need to understand the fundamentals of web design, user experience, and copywriting, but not knowing how to code is no longer an obstacle.
With barriers to advanced analytics dropping thanks to similar visual, drag-and-drop user interfaces, machine learning is rapidly gaining traction across industries worldwide. It is a big part of today’s trend toward artificial intelligence, which Gartner projects will unlock business value of $3.9 trillion by 2022.
$3.9 trillion unlocked in business value by 2022 via artificial intelligence projects.
Preparing to Launch
To pave the way for your future in machine learning, you need to carefully select your first projects, then plan for the entire lifecycle of each project. You’ll need to be thoughtful at every stage, from collecting and preparing data to selecting, training, and operationalizing your machine learning model. Here are eight steps to your future in machine learning:
1. Strategic Planning
Start with the end in mind. Rather than focusing on a specific technology or dataset, focus first on solving a business problem. Then begin drawing your strategic roadmap, which should cover goals, resourcing, and organizational requirements, but also answer critical questions about your data, deployment, and model management.
2. Smart Project Selection
This choice often comes down to “moonshots” versus “low-hanging fruit,” according to Tom Davenport. In his research, executives are nearly evenly split about which to pursue.
Davenport’s own advice is to begin with low-hanging fruit, deploying one or more smaller machine learning pilots that can nevertheless have a significant impact. In a recent webcast, he described a large bank’s ambitious robo-advisor “moonshot,” which failed to launch, while its less ambitious effort for ATM management led to a significant reduction in cash outages.
3. Proactive Communications
For any project, preparation is not just about getting yourself ready to take on a stretch goal. You need to prepare your organization, making the case for predictive analytics to your leadership and business stakeholders. And you must socialize your project ideas to gain essential buy-in and collaboration.
Bring stakeholders with you into the future of machine learning by communicating the potential impact on their most cherished goals in risk assessment, customer insight, or operational efficiency. From the very start, you should know that addressing cultural and organizational questions will be just as critical to your project’s success as the technical aspects of your initiative.
You also need to understand how to drive adoption of new processes and tools within departments and teams. Well-defined use cases with actionable implementation plans will encourage executive buy-in and ensure everyone recognizes the potential value. This is about understanding how the end user will not only consume but actually use the insights in day-to-day business.
4. Technology + Tools Resourcing
Do you have adequate computing resources for this sometimes compute-intensive activity? Some machine learning approaches require more processing capacity than others. While some projects require only a desktop, others need a room full of servers or cloud computing resources. For larger projects, cloud and on-premise servers are nearly equivalent computationally, but on-premise provides greater control, security, and data privacy, which may be a consideration for your organization.
In tandem, think about the analytics tools and platforms that will help develop, deploy, and manage your model. The latest offerings enable you to create machine learning models without having to learn to code in R or Python. Get to know the best-in-class solutions, which can also help avoid recoding in deployment and mitigate risks for your model in production.
5. A Strong Data Foundation
Machine learning algorithms learn from training data. You’re the trainer, and how you do this is as much an art as it is a science.
Training machine learning algorithms is an art and a science.
Data Form + Function
You need to supply raw, detailed source data that is accurately representative and large enough to reliably answer your specific business question. The most important first step in training your machine learning model is to get an intimate knowledge of your data. You might ask:
- Is the data relevant?
- How many types do you have and from which sources?
- How’s the quality of the data?
- Do you have enough?
- If not, how can you get more?
No strict rules apply since choices will be driven by knowledge of your business and the question you are trying to answer. But there are rules of thumb.
For instance, the old computing cliché, “garbage in, garbage out,” takes on a new life in the world of predictive analytics. Confidence in the quality and provenance of your data sets is essential, as is attention to such simple matters as whether you’re comparing apples to apples (not “hourly” to “weekly,” or dollars to euros).
Big Data? Yes, Please!
“Less is more” certainly does not apply. You want to be almost greedy about the amount of data you use.
For a categorical question (e.g., projecting the up-or-down vote on a local bond issue), Google suggests multiplying the number of examples for each category (Democrats, Republicans, or Independents?) by at least 10 times the number of attributes (e.g., whether they voted in the last election). A numerical prediction might require a factor of 50. You can almost visualize it: The more columns of data you want to use, the more rows of data you need to have.
At the start of a project, data sources may be scattered and disorganized, and it may be unclear how to support multiple architectures and platforms. This is where your IT team could be your ally if you treat them right. Get IT on board, sharing your strategic plan, your buy-in from business users, and your goal of harmonizing data sources on a secure, central platform so they can help you gather the data you need.
Then comes the input of multiple outcomes, or “targets.” For instance, in the bank fraud analysis described above, you could train for examples of both legitimate transactions and money laundering transactions, so the model can discern between good customers and bad actors.
Don’t Make These Mistakes
One common machine learning issue is bias, or being “wrong” because the training data wasn’t accurately representative. Bias could also result in far worse outcomes, such as discrimination against entire groups of people. In the bank example, if we only feed the machine data on money launderers, it won’t be able to predict a good customer, for lack of data, and might reject an applicant unfairly.
Another common mistake is “over-fitting” your model. This means literally perfecting your model for the training data to the point where it will not perform well on new datasets. The rule of thumb here is that “perfect is the enemy of good.” Establish thresholds within which the data is “good enough” so your model can be successful with multiple datasets. (Read: You are not going to be right 100 percent of the time, so you might want to prepare your boss.)
6. Create and Communicate a Trail
You need to be able to explain your process to tell a human being why the variables you chose are meaningful and how your model arrived at a certain decision. At a minimum, explainability is critical to sustaining the support of your leadership and business stakeholders. And in some industries, such as financial services, it is critical for regulatory as well as business reasons. For example, lacking explainability for denials of home loans could raise challenges of consumer harm, resulting in lawsuits, regulatory enforcement, fines, and reputational damage.
7. Model Management
Machine learning models are not a “one and done” exercise. Model management, which can involve monitoring, revisiting, and retraining, is a fundamental part of their life cycle.
How you put your model into production will determine how easy it is to manage. Yet, even getting into production could be challenging. In fact, this step was the highest hurdle cited in Davenport’s machine learning research, with 47 percent of executives saying that it has been difficult to integrate machine learning projects with existing processes and systems.
Processes and Systems
47% of executives say integrating machine learning projects with existing processes and systems is their biggest hurdle.
— Tom Davenport, Co-Founder, International Institute for Analytics
Machine learning runs on different languages than most commonly-used business applications, so translation is one production issue. Translate carefully, or the model could lose performance due to mistranslations. Also, translation often compromises visibility from the original version.
Models can also degrade while in production because of skewed learning over time, as the result of overrides or other user interactions, or as new facts on the ground overtake older data. Monitoring and periodic retraining are essential.
If your model management isn’t up to par, the downsides could be grim. Your users could end up frustrated by downtime as you make ongoing adjustments. Worse, they could get the wrong answers to their questions — even recommend the wrong course of action for your company.
Help is available on today’s market in the form of innovative model production systems that can ease what is otherwise an error-prone, labor-intensive, multi-step process. Some tools and platforms eliminate the need to recode models, enable testing without disrupting users, and provide version control and tracking.
8. Change Management
Everyone loves progress, but nobody likes change. When it comes time to roll out your machine learning model, employ a deliberate communications and change management strategy, involving the stakeholders you engaged at the start of your project. Then socialize your users’ successes to sustain momentum over time.
Your Predictive Analytics Future Looks Bright
The impact of machine learning is growing in our lives and our work — not to put you out of a job, but to enhance your work and enable you to do more powerful analyses with Big Data and machine learning models. This evolution in analytics presents great opportunities and challenges for you, as a data analyst, and for your company. Be strategic, to make sure your initiatives gain traction and deliver the business value you’ve built into them. Manage your launch into the future of machine learning with all the care it deserves. Your future, and your company’s future, will be all the brighter.
Read This Next
How To A/B Test at Scale
See the steps you can take to make any A/B testing process more efficient and accurate.
Infographic: How Data Prep Inefficiencies Cost You Money
Delays in accessing and prepping data can have a real impact on your bottom line.
Lead by Example
Celebrating mentorship through the experiences of Alteryx’s 2022 Women of the Channel winners .