This post recaps a great tech sharing on Databricks Data+AI Summit 2022 by Aakrati Talati and Avesh Singh. The sharing presented production ML problems most companies facing and demonstrates how to use the Databricks Feature Store to get your ML project into production effortlessly. This is part #1 of the series.
Origin sharing: Enable Production ML with Databricks Feature Store

At a glance
- AI/ML project is hard and expensive. Even tough for veterans, one might still only have about 50% chance of eventually productize your project.
- Production Machine Learning = Production Software + Production Data
- Data challenges in ML: data silos, online-offline skew, client configuration, and data freshness.
- Databrick Feature Store built on top of the data lakehouse, tackle the data challenges in ML.

Before you start: your project are likely going to fail, it is a 50-50 shot if you know the business
Ever since 2017, tens of thousands of AI/ML ambitious projects was spun up to tackle problems in real-world. However, most of these projects ended up stuck in “limbo” and never made to the production. According to a report by Garner in 2020, AI/ML projects generally have a survival rate of about 53%. Another post by Monte Carlo Data estimated about 90% project death rate for those unprepared:
Gartner research shows only 53% of projects make it from artificial intelligence (AI) prototypes to production. CIOs and IT leaders find it hard to scale AI projects because they lack the tools to create and manage a production-grade AI pipeline.
Gartner Identifies the Top Strategic Technology Trends for 2021
For companies still working to develop a data-driven culture, that number is likely far higher, with some failure-rate estimates soaring to nearly 90%.
Why Production Machine Learning Fails — And How To Fix It
Why AI/ML project so hard (to ship)?
In short, the complexity of an AI/ML project comes from the complexities of its components.
The components that makes up an AI/ML projects typically are data science, data engineering and software engineering (sometime hardware if you think about CUDA/FPGA). These problems by themselves are already difficult problems to solve. Combine the complexities and times four, you get the complexity of an AL/ML project. Here is a recipe Avesh Singh shared:
Production Machine Learning = Production Software + Production Data
Assuming you have all the ingrediencies right – talented people, high quality data and some solid tech stack to process the data, there are still numerous ways your project can fail, such as data drifting, biased model, testing, scaling, latency… you name it.
What are the challenges faced by AI/ML projects?
#1. Data silos

An AI/ML project usually starts off using some sampled raw data. Data scientists/engineers write codes that aggregate, transform and train a model with raw data. A lot of time after training, these artifacts are discarded, making it harder for other to build on top of the current result. This creates a lot of “spaghetti code” because of data silos. (usually solved by adopting data lakehouse)
#2. Online-offline skew

Inferencing is a totally different business from the model training. It mostly fall under the bucket of software engineering when one deploys a machine learning model to a web service and start serving requests. In most cases, the model requires the input to be the same feature set as the training data. However, the featureizer code used in offline training might not be performant enough for the latency requirement, which usually at milliseconds level. To contain the variances in the E2E pipeline, the featurizer logics used in online and offline must be identical. This is also very difficult to track down.
#3. Client Configuration

While serving a model, often times there will be different clients running different versions of the code requiring working with a specific version of data, featurizer, and model. This causes version fragmentation, rollout coordination, compatibility problems which are very subtle and very hard to track down.
#4. Data Freshness

Lastly, most of the AI/ML model are dealing with “active data”. For example, an user might choose to change its profile data at any given time or move to another region. This posts another set of challenges to keep the data flowing into training and inferences to be up-to-date.
Summary of Part #1
With all the data + engineering challenges faced by an AI/ML project, the owner of it needs to be extremely cautious picking the tools and infrastructure to maximize the chance of success. To prepare your team for an AI/ML project, check out my other post about how DoorDash cultivate data culture and mindset for successful MLOps – MLOps at DoorDash (Data+A.I. summit 2022).

In my next post, I will be talking about how Databricks Feature Store address the data challenges for AI/ML projects. Stay tune!
If you like my post, subscribe for email updates
Or follow me on social media
Recent Posts
- M365 Graph Learning for Search & Recommendation
- Why 90% AI/ML projects not productionized? Enable Production ML with Databricks Feature Store (Databricks Data+AI Summit 2022) – Part 2
- Why 90% AI/ML projects not productionized? Enable Production ML with Databricks Feature Store (Databricks Data+AI Summit 2022) – Part 1
- Beyond Monitoring: The Rise of Data Observability (Databricks Data+AI Summit 2022)
- Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors (Databricks Data+AI Summit 2022)
joyful! International Organizations Issue Warnings About [Global Risk] 2025 outstanding