Big Data can foster big solutions, but it often comes with its own big headaches. Seemingly small issues that hide in the crevices of your workday are annoying time-suckers — and worse, you get so used to them you forget how painful they really are.
Job dissatisfaction often shows itself indirectly: Do you wish you had more time? Hoping to win the lottery primarily to escape the workday? Are you tired of not being able to do more with your data and score that big promotion? Wondering why you got into this field in the first place?
If you’ve answered “yes” to any of these questions, you’re not alone. Many analysts are used to doing data preparation in spreadsheets and finishing a report just in the nick of time, leaving zero energy to tackle tough problems. The good news is that knowing the problem is the first step to kicking it to the curb. It’s time to do more than just survive at work. You deserve to thrive at work.
Here’s a list of the five most common challenges data analysts report in their day-to-day work.
Which of these are all too familiar?
1. “If my boss only knew how long data prep really takes.”
Every minute that passes is another step closer to an outrageous deadline you are frantically trying to meet. Your boss asked for 15 different charts to present at 10 a.m. like you can wave a magic wand and make answers appear. Why don’t executives understand that’s not how it works?
According to “The 80/20 Data Science Dilemma” in Info World, 80% of analysts’ time is devoted to prep and blend, leaving just 20% for actual analysis. Data prep and blend is the critical first step in answering those tough questions. Get it right, and the downstream effect is accurate insights. Get it wrong, and the downstream effect is, well, less-than-accurate insights.
Clearly, data preparation is where time (and dreams) goes to die, but why does it have to take so long? Manual processes are the culprit. Most analysts spend the bulk of their time wrangling data with various forms of copy/paste, formulas, and macros. This is painful, time-consuming, and the reason you have dark thoughts about throwing your mouse at the cubicle wall and never opening a spreadsheet again.
Even if you are a spreadsheet ninja or could code in your sleep, data preparation can still be slow-going with traditional data-cleansing methods.
Here are common data preparation hurdles that are most likely to slow you down:
- Records that contain unwanted characters like %, &, and other symbols or punctuation
- Null values that send your advanced analytics into a tailspin
- Non-identical duplicates, like “Maria Seelos” and “M. Seelos”
- Unit conversions, such as pounds to kilos or feet to yards
- Currency conversions
- White spaces
- Unrecognizable characters imported from different languages
- True/false records that need to become yes/no records (and vice versa)
Can anyone say “tedious” and “time-consuming”?
2. “Data comes from everywhere; I wish it was easier to use all of it.”
According to Forbes, we generate 2.5 quintillion bytes of data a day — and more data = more problems! Each data source comes in a different format and creates unique challenges in bringing them together to analyze.
Joining data shouldn’t be a marriage of inconvenience. To answer more complex questions, you likely have to join multiple sources of data. It can be tough when there are a million different data sources that you have to bring together from different file types and locations like SQL databases, CSV, XML, AWS, Excel (XLSX) formats, and more. While each source represents a piece of the data puzzle, the manual processes you currently use to bring all this data together are highly inefficient.
Reports often require multiple programming languages and approaches to achieve your goal. From R to Python to SQL, from dplyr to sqldf to data.table, exploring and applying these solutions eats up time. SQL, R, and Python approaches can limit your flexibility when you want to use a single solution to:
- Join data from multiple sources in different formats
- Find and replace data without modifying the original source
- Group records based on two input keys from your data stream
- Produce a dataset that contains every combination of two or more tables
If you’re asking, “Why is it so hard?” you are definitely not alone. They don’t call it Big Data for nothing, right? Still, having more data to overlay your reports and create more interesting insights should be exciting, not overwhelming.
3. “Depending on others for data is a drag.”
According to a recent IDC survey referenced in “ETL is slowing down real-time data analytics”, nearly two-thirds of data going through traditional prep and blend is at least five days old by the time it reaches an analytics database. Why? Usually, it requires someone else to get the data to the right location for you. Tracking down data is a pain. It might be locked in the IT department and take a few days to access because IT has many priorities in front of your request. Or, your data might be buried in a spreadsheet that’s shuttled back and forth over email or tucked away in a custom database managed by a single user.
These scenarios leave you dependent on the timelines of others, while your own project schedule stutters or stalls completely. 62% of data analysts must depend on others within their organization to perform at least some steps in the analytics process. This means you’re stressed about missing deadlines and making excuses to the boss.
By 2021, 62% of data analysts must depend on others within their organization to perform at least some steps in the analytics process.
You know what they say, “Old data is better than no data.” But you deserve better. When internal processes are slow and reports take days to generate, you’re always behind and not providing the top-notch insights you know you can deliver.
4. “I need to go deeper with insights, but I can’t get there with the solutions and data I have today.”
With 66% of executives ranking location intelligence as critical (Forbes), being able to answer basic “where” and “who” questions is crucial for companies to outperform their competitors.
Unfortunately, antiquated approaches aren’t built for advanced capabilities. Once data is prepped, you’ll want to enrich it to extract as much value as possible. For example, you may capture a company’s name and address, but it’s better to augment that information with deeper business information like industry, size, and revenue. Having this bit of extra info helps teams like Sales prioritize follow-up according to how each lead aligns with their target industries. The possibilities are endless.
The same holds true for location intelligence. With spatial data, you can pinpoint where your target customers are located to design better marketing campaigns, find new retail locations, or optimize your supply chain logistics. When it comes to customers and prospects, it’s important to know details like age and income. It’s even better to gain deeper knowledge, including the types of technology, food, and household products they purchase, to increase your understanding of them and inspire new segmentation approaches.
You simply can’t get very far determining these deeper insights with outdated methods like spreadsheets. Often, you’ll need the help of a specialist and a bit of manual coding to enrich your data — and that takes time and expertise you don’t have. Which brings us to….
5. “I wish I could do advanced analytics on my own.”
Many analysts are still focused on descriptive analytics that explain what already happened. They would like to move toward more forecasting and informed scenario-building, but aren’t sure how. Yet more than ever, data analysts are expected to present advanced analytics such as predictive and prescriptive models, including creating decision trees, running A/B tests and logistic regressions, and performing market basket analysis. At one time, predictive and prescriptive models needed to be built by data scientists, but that’s no longer the case.
By 2021, 66% of analytics processes will no longer simply discover what happened and why; instead, they will also prescribe what should be done.
Ventana Research Assertions
If your advanced analytics are still dependent on others to implement, it’s important to know you have other options. Now, modern, self-service analytics technology can empower your debut into advanced analytics, no coding skills required.
Crush These Five Problems. You Were Born to Solve.
You probably already feel better knowing that analysts around the world feel the same challenges as you. But commiseration won’t revive your career. Open the possibilities of asking more of your data. Whether you want to be the badass analyst of your office with your amazing insights or simply desire to love your job again, it’s time to take control with self-service analytics to crush your data problems and transform them into opportunities.