Joe Hellerstein

Joe is Trifacta’s Chief Strategy Officer, Co-founder and Jim Gray Chair of Computer Science at UC Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies “most likely to change our world”.

Back to SQL: Data Engineering

As part of growing our massive new Data Science program at Berkeley, it became clear that we needed to target a class specifically for Data Engineering. The goals of Data Engineering are different than Software Engineering. So it was interesting to think through this curriculum and how we would teach it differently than our established database classes.

In this new approach, we ended up emphasizing four steps to SQL for Data Engineering that are atypical of a traditional databases class: data quality, data reshaping, “spreadsheet tasks,” and data pipeline testing.


Joe Hellerstein  •  September 7, 2021

Transformation: Next Level SQL

When we use SQL for Transformation—the “T” in ELT—the focus changes. In this case, we’re taking many messy and disparate tables and manipulating them into a more usable or common form. To take our example from before, we may be extracting and loading sales data from 17 electronics chains that sold the phones, and our job in SQL is to write transformation queries that integrate that data together.


Joe Hellerstein  •  August 30, 2021

SQL Pipelines and ELT

ELT is increasingly attractive these days. Modern data warehouses are flexible and increasingly cost-effective, allowing us to store large volumes of data—even messy data that includes volumes of text and images. In this environment, transformations occur in the data warehouse, where the native language is SQL. 


Joe Hellerstein  •  August 23, 2021

Summer of SQL: Why It’s Back

For the first decades of the Millenium, it seemed like the Java-centric approach was the "hot new thing," but SQL has been roaring back. Today, SQL seems to be the focus of every data engineering conversation and popping back up on billboards in Silicon Valley. 

The comparison of the two "shops" inevitably leads to the question: which is better? There are pros and cons to emphasizing one or the other. 


Joe Hellerstein  •  August 16, 2021
Load More