Datafold, a startup that automates workflows and maintains data quality, announced today that it has raised $ 20 million in a Series A funding round, led by NEA (New Enterprise Associates). The investment, which also participated from Amplify Partners, Databricks and Dbt Labs, will be used by the company to further develop its data reliability platform and expand its team.
For any data-driven organization, ensuring the quality of data pipelines on a daily basis is the key to having well-functioning dashboards, properly trained AI and ML models, and accurate analytics. But with an explosion in the diversity and volume of data as well as increasing demands to deliver data products faster, data engineers using manual methods of testing, monitoring and quality assurance often have a hard time. They fail to keep up with the complexity.
Solution to ensure high quality data pipelines
Founded in 2020, Datafold strives to address these challenges and prevent data disasters with its end-to-end reliability platform. The solution automates several tedious workflows in the process of developing data products, from finding high quality data to testing changes / corrections before they are implemented in production and monitoring data pipelines that are already in production.
“Datafold basically provides a single data catalog that allows data developers to find relevant datasets from a bunch of thousands and instantly assess how they work, which means seeing distributions of data in each column, the quality measurements (whether a given column is filled in or mostly nulled) and the lineage of the dataset, ”Gleb Mezhanskiy, founder and CEO of Datafold, told Venturebeat.
Companies like Bigeye and Monte Carlo also operate in the field to ensure data reliability, though Mezhanskiy said most of these and other solutions created internally by large organizations are focused on detecting problems when the data pipeline is in production. As a result, when the team learns about the corrupted data, the damage has already occurred where managers make decisions based on incorrect dashboard numbers or ML models trained with bias.
Data folds, on the other hand, focus on proactively identifying data anomalies before they go into production and do the damage. The solution’s flagship feature, Data Diff, automates data testing in the change management workflow and integrates it into the CI / CD process and code repositories. This shows data practitioners how a change in the data processing code will affect the resulting data and downstream products, such as BI dashboards, allowing them to capture issues that could result from a hotfix / change before the code reaches production and the data is calculated.
“Before we used Datafold, our customer teams would spend several hours [on] same task. But with our tool, it takes them about five minutes. So it’s a massive, massive acceleration of testing, ”Mezhanskiy stressed, noting that the company works with a“ few dozen customers ”and helps them ensure 100% code testing.
In addition to this, like its competitors, the company also utilizes machine learning to monitor and detect errors in old data products and pipelines that are already in production.
“We basically profile the data, calculate metrics, run it against our machine learning model and answer the question of whether the data behaves as expected. If it does not, we warn the customer about slack or any other channel,” said CEO.
Some of the prominent customers that Datafold has include Patreon, Thumbtack, Faire, Dutchie, Amino, Truebill and Vital.
The way forward for data reliability
Going forward, Datafold plans to promote its product and expand its ability to automate more of the checks and tests performed by data engineers. The company believes that more than 80% of what computer engineers do could be automated.
Along with this, it also plans to launch a smart alarm feature that will prioritize data anomalies and help teams decide which issues are the most critical and need to be addressed first. The feature is currently being tested with a select few customers.
In the short term, Datafold expects these improvements to record a fivefold increase. The company will also expand its team to 40 or more by the end of next year.
VentureBeat’s mission is to be a digital marketplace for tech makers to learn about transformative technology and trade. Our site provides essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to join our community to access:
- updated information on topics of interest to you
- our newsletters
- gated thoughtful content and reduced access to our valued events, such as Transformation 2021: Learn more
- networking features and more