Trifacta Legends recognizes a customer every month who is doing groundbreaking work with data using Trifacta.
We’re pleased to announce the Trifacta Legend for January 2022 Francoise Pickart Sr. Epidemiologist, Systems Development at the Washington State Department of Health.
Francoise Pickart is a Senior Epidemiologist at the Washington State Department of Health where she along with her team focuses on designing and building modern data systems for public health outbreaks and surveillance. Previously, Ms. Pickart was the Director of Risk + Analytics for NYC Health. As part of her responsibilities there, she created the centralized data team responsible for providing data to decision-makers and led the City’s efforts to design data systems to respond to public health emergencies from Hurricane Sandy to COVID-19. Her team developed the first Citywide data warehouse for emergency response, the first citywide post-emergency canvassing application to support outreach to every household in NYC, and the adoption of a modern analytics stack at NYC Health. Ms. Pickart has a degree in chemistry from the University of Washington and a Masters’ degree in public health from Columbia University.
We talked to Francoise about her experience and insights from working with large datasets in public health. Francoise shared the challenges her team has faced and how she overcame those obstacles to enable her teams and build successful scalable platforms using tools such as Trifacta.
Trifacta: First, thank you Francoise for being a valued customer of Trifacta. It has been a pleasure to work with you and we look forward to the continued partnership. We feel privileged to award you as our Trifacta Legend for January 2022. Well deserved, let’s get started.
Trifacta: What was the primary motivator to use Trifacta?
Francoise: Public Health Response requires rapid data analyzes to inform decision-making and public health action. Unfortunately, Public Health has been underfunded for the past 50 years and our data systems reflect that. The current analytic tools were not designed for scale and urgency. Analysts cannot clean or integrate data fast enough because the current tools are not modern or scalable, especially in the current times. Consequently, before epidemiologists can analyze data they have to conduct long and painstaking work to clean, transform, standardize, and restructure data before it can be queried. We needed intuitive tools like Trifacta that allow advanced analysts to process data as fast as they think, without intermediaries and in a transparent fashion so the next person can easily see what they have done.
Trifacta: Can you tell us how data was imported before using Trifacta?
Francoise: COVID-19 forced WA DoH to speed up efforts to move analytics to the cloud. The scale of data coming in overwhelmed our traditional processes and virtual machines did not resolve the problem. The data coming from our legacy transactional systems was a nightmare. We have tables with more than 1000 columns that contain a huge variety of data needed by different groups and with complex relationships that must be understood to properly link data. We also have many reference tables, such as hospitals, schools, and clinic sites that are frequently updated and must be linked to collected data sets. We began by moving our immunization, case investigation, and contact tracing data to the cloud on Microsoft Azure. We use Databricks for our pipelines built-in R and PowerBI for reporting. We brought in Trifacta to allow epidemiologists to rapidly explore, clean, standardize and transform data in the cloud for analytics.
Trifacta: What are some of the benefits you have gained from using Trifacta?
Francoise: Trifacta sits within our CEDAR (Cloud Enterprise Data Analytics Repository) environment in Azure where data scientists can access the raw data and create analytics-friendly tables for program analysts. Program analysts can then access these “usable” data sets and explore, clean, standardize, and transform the data before analysis.
Trifacta has been intuitive for our analysts and they are thrilled that they can perform familiar functions much more easily in Trifacta than in R or SAS. Data quality epis love the easy standardization, the different clustering algorithms, and the ability to turn free text into categorical data quickly.
The biggest benefit so far is the ability to extract insight from data we were collecting but not analyzing. Trifacta has allowed us to build workflows that update tables for ready complex analysis by several teams.
Creating one of our larger workflows might take someone on my team 2-3 days. That workflow then reduces a week’s worth of data prep work done by several analysts independently.
Trifacta: Thank you again, Francoise for your time. I know this has been a very busy time for you and your team. Any final words for us?
Francoise: Trifacta has been the easiest tool to introduce to epidemiologists. The interface is intuitive, and the visual profiling allows us to quickly dive into the data faster than ever before. The natural language recipes allow analysts to share their work with end-users without having to translate code, increasing transparency and trust in how they have handled the data.
Trifacta has helped us create a culture of self-service “data pipelines” that teams can create and manage themselves without IT and share centrally so data prep work is done once.
You can learn more about Francoise and the Washington State Department of Health here.