Understand the Variables: The basis for any data analysis begins with an understanding of variables. A quick read of column names is a good place to start. A closer look at data catalogues, field descriptions, and metadata can offer insight into to what each field represents and help discover missing or incomplete data.
Detect Any Outliers: Outliers or anomalies can derail an analysis and distort the reality of a dataset, so it’s important to identify them early on. Data visualization, numerical methods, interquartile ranges, and hypothesis testing are the most common ways of detecting outliers. A boxplot, histogram, or scatterplot, for example, makes it easy to spot points far outside the standard range, while a z-score informs how far from the mean a data point is. Once found, an analyst can investigate, adjust, omit, or ignore the outliers. No matter the choice, the decision should be noted in the analysis.
Examine Patterns and Relationships: Plotting a dataset in a variety of ways makes it easier to identify and examine the patterns and relationships among variables. For example, a business exploring data from multiple stores may have information on location, population, temperature, and per capita income. To estimate sales for a new location, they need to decide which variables to include in their predictive model.