Data comes in all different shapes, sizes, and formats. It comes from everywhere, ranging from corporate databases that are decades old to information generated on smartphones during the time it took to read this sentence. With Big Data comes great opportunities.
According to the IDC info brief “The State of Data Science and Analytics,” six is the average number of data sources per analytics or data science activity, and seven is the average number of target outputs per analytics or data science activity.
A study by researchers at MIT and Wharton found that across all industries, firms that take a data-driven approach to decisions get results that are 5-6% better than their industry norm, and some sectors achieve truly prodigious results. McKinsey agrees, estimating that the effective use of Big Data can increase profits in the retail sector by as much as 60%.
McKinsey agrees, estimating that the effective use of Big Data can increase profits in the retail sector by as much as 60%.
What’s the first step to harnessing the power of Big Data when there are many ways to begin the journey? For those who crave a little order from the chaos of Big Data, keep reading. We have a few good definitions for you.
Let’s first start with the condition of your data, aka its format. If you’re working with multiple spreadsheets, you may very well be working with all three formats. But more data shouldn’t mean more problems.
Three Formats of Data
Traditional data stored in a neat record format with well-defined data types such as fixed field numeric and alphanumeric characters. Structured data is the basis for most existing and legacy databases and is relatively easy to store and manage.
Unformatted or loosely formatted numbers or characters inside a field but with little or no structure within the field. A social media post, such as a tweet, is an example of semi-structured data. Semi-structured data is more complex to store and process than structured data.
Data that is not text-based, such as pictures, images, or sound files generated by devices or posted on social media. Unstructured data is a challenge to manage because it is large in size, difficult to catalog and index, and problematic to store in databases.
So you’ve nailed down the different ways data can look. Now understanding the three categories of data will especially click in as you start to analyze your data and scrutinize its usefulness in your reports and projects. Let’s take a look.
Three Categories of Data
Data that lives in existing or legacy databases and is often in a well-structured format. Rows in an Excel spreadsheet, records in an accounting database table, or account information in an insurance mainframe database are examples of traditional data. More modern examples include information in data warehouses and cloud applications.
Data that is industry-specific or special purpose and used to supplement (or enrich) existing data. For example, spatial grid coordinates identifying where customers like to shop would enrich sales information, or demographic information about customers could help a retailer choose new product lines.
“Big Data” — which is simply large, complex data sets — as well as other sources such as social media or marketing automation data are common examples of emerging data. This category is newer, more valuable, and often the most difficult data to identify and leverage, but it provides exponential value when applied to strategic business decisions.
Bottling Data at Coca-Cola
Coca-Cola, for instance, uses a self-service analytics platform to provide more than 600 restaurant owners with personalized reports on their sales and beverage usage. This lets them optimize inventories, avoid stock replenishment delays, and increase profit margins, says Jay Caplan, a senior business analytics manager at the beverage company.
Caplan supports one of Coca Cola’s largest accounts, comprised of thousands of franchisees that sell close to 1 billion beverages a year. Trying to make sense of the bottling data, Caplan ran into issues with Big Data. “The massive datasets I was pulling in from our data repository kept blowing up Access and Excel,” he explains. But running the data through their self-service analytics platform made it usable — and valuable: “The fact that I could process over 4.5 million rows of data from separate data sets without writing a single line of code was just incredible.”
“The fact that I could process over 4.5 million rows of data from separate data sets without writing a single line of code was just incredible.”
– Jay Caplan, Senior Business Analytics Manager, Coca Cola
Bringing data together from different sources is how data adds value to the decision-making process. In the real world, the data you need usually won’t sit neatly in a predefined database inside a fully prepared data table just waiting for you to access it. In most cases, data must be obtained from different sources to add the depth and wide scope necessary for the best possible analysis and decision making.
Related: Read how one analyst got her feet wet in analytics. Don’t miss the advice from Sean Adams about focusing on defensive design, because “the data is always going to be shit.”
Finally, here are the most common types of data you will find when performing everything from basic to complex data analysis.
Five Types of Data
A string represents alphanumeric data and can include letters, numbers, spaces, or other types of characters. A string can also be thought of as plain text. All the characters in a string are considered text even if the characters are digits.
There are several different numeric data types, including integers, decimals, floats, and doubles. Numeric data types do not have adjustable lengths except for Fixed Decimal.
Date and time data is what it sounds like, though its format can look a bit different. You may have a 10-character String in “yyyy-mm-dd” format for a date or an 8-character String in “hh:mm:ss” format for time. Or, you may have DateTime information that looks something like a 19-character String in “yyyy-mm-dd hh:mm:ss” format.
Example: December 2, 2005 = 2005-12-02, 2:47 and 53 seconds a.m. = 02:47:53, 2005-12-02 14:47:53
To start, it’s helpful to know that Bool is an expression with only two possible values.
Example: True or False where False equals 0 and True equals non-zero.
The spatial object associated with a data record. There can be multiple spatial object fields contained within a table.
Example: A spatial object can consist of a point, line, polyline, or polygon.
The simple fact is that the best decisions will be made only when all the relevant data is available for analysis. The key data almost always comes from multiple data sources and often comes in different formats. Knowing all the formats, categories, and types of data is one of the first steps in diving into analyzing. Find an analytics platform that allows you to blend any format of data from anywhere so that you can solve any problem, any time.