Rapid Deep Learning: The Bridge We Forgot to Build

9 min readJan 3, 2022

As a typical software engineer, I spent my years dealing with real-world problems and building large-scale distributed software systems around them. Until recently, we used deterministic algorithms to automate the mundane tasks in computer code, only to execute those with speed and scale beyond human capability. The computer relieved us to focus on the jobs where it could not emulate basic human intelligence. Thanks to Moor’s Law, the Artificial Intelligence(AI) is slowly and steadily making inroad into our society, challenging our intelligence in many fronts. We see such evidences in our daily life now, for example, a chatbot for answering customer queries, a smart translator narrating the English text in French, a cashier-less store sending you the bill directly to your mobile. The benefit of artificial intelligence leave us with no choice but to embrace it and stay in competition.

In my observation, many teams within a sufficiently large organization adopted Machine Learning(ML) in their problem domains. But they suffered huge inertia trying to move forward from initial adoption. The possible reasons are -

While the initial cost of adopting ML was well justified in terms of saving of human cost and achieving superhuman speed, similar engineering cost for incremental advancement was not well justified. Lack of ML engineering and high touch on the existing pipeline prevent the teams from rapid deployment.
Even with common business goal for the organization, teams end up creating siloed ML pipelines due to lack of common structure or common knowledge management. Such lack of discipline prevents the scientists and engineers exploring and then exploiting common knowledge built within the organization.

I argue in favor of three essential pillars within the organization that lay the foundation of rapid deployment of AI. Note that my definition of organization here is a software-driven business vertical dealing with common class of problems in a particular domain. Some examples of domain are web-search, conversational AI, e-commerce and personalization where AI plays a big role.

Path To Production: Role of MLE

The path to production is fraught with challenges. The Deep Neural Network(DNN) models which came out to be the winner in controlled experiments in the lab, now got exposed to the world. Is the latency per inference requests acceptable? How many parallel requests can be served by one model instance to maximize throughput? Is it possible to bring up a shallow student model from the teacher model without losing much accuracy? Is there a framework or accelerator which makes tensor operation faster? Does different graph representation and associated runtime make way for further efficiency gain? Should conversion of tensor to NumPy give computational advantage running on CPU? Should the model architecture be represented in a single monolithic codebase or can it be represented in a more modular form, such as model cascade or ensemble? In order to answer the above questions, we need a nice blend of knowledge in software engineering as well as deep learning science. The Machine Learning Engineer(MLE) solves those puzzles to make the raw output from a lab to a finished product for the world. The role of a MLE in an AI-driven organization is critical in rapid deployment by bridging the large gap between ML science and software engineering. The lack of a MLE in the organization often yields suboptimal use of AI or a painfully slow ML pipeline, leave alone the conflict between scientists and engineers. While organization hire best ML Scientist(MLS) and Software Dev Engineer(SDE) to reap the benefit of AI, they often ignore the role of MLE who can bridge the gap between MLS and SDE.

Model Flow: Training to Retraining

The journey of a DNN model starts with a robust training framework, where information and ground truths come together to form deep neural networks. Scientists use popular frameworks like PyTorch, TensorFlow etc. for training a brand-new neural network or retrain a pre-existing one. This is the slowest process in the entire flow with high touch from scientists and involve enormous computation power, depending on the depth and breadth of the network. Large DNN training often employs distributed training framework like DeepSpeed, Horovod etc. Post training, the DNN may need to go through several optimization techniques suited for real-world applications. Few such techniques are pruning, distillation and quantization. The MLE engagement starts in this stage, together with MLS, they come up with right sizing and right cost-accuracy tradeoff. Once the optimized DNN carries a great amount of intelligence for many related tasks, it needs to be fine-tuned for a specific task. Often the fine-tuning involves unpacking last few layers of DNN and rebuilding those layers with task-specific knowledge, morphing the DNN into a predictor or generator. This form of Transfer Learning neither require deep ML expertise nor distributed infrastructure, so can easily be performed by MLE. There are other forms of transfer learning which may need to rebuild the entire DNN with new set of hyperparameters, hence need MLS expertise. We can treat a fine-tuned model as code in Software 2.0, which means, it still needs to be compiled. This is where MLE creates the bridge between MLS and SDE, they produce a compiled, accelerated DNN ready for deployment in a target runtime and processor architecture. A compiled and accelerated DNN produces the clean executable which exploits the underlying runtime and hardware to yield fast inference, the change of hands from MLE to SDE can’t be better. The SDE subsequently adds the SDK for different flavors and build the packages for production. For example, they add a python serving stack to run the inference in sync mode, or Spark library to run batch inference as a Spark job. We must not trivialize this stage as packaging requires full dependency closure; SDE needs to deal with dependency hell, errors and warnings. Following the packaging, SDE chooses the right technology to deploy them for different access patterns, namely, sync, async and batch inference. In a typical access pattern, SDE chooses a combination of runtime and hardware to achieve the best performance. For example, for a sync inference, a Triton server running on a GPU-accelerated hardware can serve a GPU-complied model with the optimal use of underlying CPU and GPU. Similarly for batch inference, Spark can distribute the data at a record level and achieve maximum parallelization. While it is important to keep the model running in production without issue, it is equally important to observe the anomalous inferences as that may incur severe negative business impact. The last step in the model flow is setting up the observability tools to monitor the performance of the model in production. The data anomaly and model drift(anomalous inference) are generally encoded in statistical rules and evaluated on sample data and inference collected from production environment. The ML observability stage can detect model overfitting/underfitting, unknow unknowns and stale knowledge, hence generates the vital trigger for deeper scientific analysis during retraining phase. By relaying the feedback to training phase, the ML observability completes the ML flow loop.

Note the non-overlapping responsibilities of MLS, MLE and SDE in the ML flow loop. The level of automation increases as we go along the flow. For example, training is a high-touch stage where MLS perform trial and error, whereas, ML observability is almost zero-touch as sample collection and setting up the rules can be fully automated. Better discipline via setting up engineering standard and adhering to that helps the organization to automate to go further upstream, even up to fine-tuning stage.

Model Library: Transfer Learning in Action

“If only we knew what we know” — Carla O’Dell

An efficient knowledge management system is the secret sauce of rapid innovation. From the time we invented writing, public library systems played a vital role in preserving the knowledge in form of literature. The internet was started with the vision of sharing knowledge across universities in digital form. Subsequently we got the Free Software movement to share code and software for the masses. Thanks to the OSS communities, we don’t need to implement a bloom filter or a web server from scratch.

In software 2.0, the knowledge is woven into neural networks. If we build a DNN for sentence auto-completion in your email, the same or part of the DNN can be used in auto-correction of existing paragraph as well. Or a DNN model capable of identifying celebrity face in an image can be employed in many tasks, including abuse and copyright infringement. The new form of knowledge deeply embedded into neural networks demands full manifestation, cataloging and easy accessibility across MLS, MLE and SDE. A model library has the following benefits –

A general knowledge acquired via certain tasks is always preserved for other tasks. We apply transfer learning either by fine-tuning the general knowledge or building cascading/ensemble of multiple models to increase the overall depth and breadth of knowledge of the organization.
Saves us from huge cost of pre-training general learning networks for each and every task within the organization. Cost includes the cost of MLS activities, the cost of compute infrastructure and the amount of time to train the network.
With a complete manifestation and accessibility via the library, evaluating and deploying the models for a task becomes faster, leading to rapid deployment of deep learning into business pipeline.

While the model library acts as brain of an organization, building one requires full manifestation of a model. Without right syntactic and semantic representation, it is hard to catalog and consume the model for multiple scenarios. The three aspects which represent a deep learning model –

Artifact

This is mostly syntactic representation, consists of the code, weights, hyperparameters etc. This aspect also abstract two essential functions of the model, fit() for fine-tuning and predict() for inference. The target runtime, hardware, latency, batching characteristics and few other execution constraints and characteristics are represented via artifact.

Intake

This represents the domain and modality of the model. Domain represents the universe from which the model acquired the knowledge. For example, an NLP model trained on entire Wikipedia corpus, or a bird species identification model from eBird database. Knowing the domain helps the consumers understanding the overlap in task domains. In addition, modality tells us what data is acceptable as input and in what form. The figure 2 shows the data modality hierarchy used as input to a model. Knowing the modality help consumers maintain the data consistency between training and inference stage of the model.

Result

The result is the defining aspect of a deep learning model, as that is the sole purpose behind its existence. This aspect forms the major part of semantic abstraction; helps cataloging the model in model library. The figure 3 attempts to represent a DL model result hierarchy in limited form. The precision & coverage characteristics of the model is captured in the result aspect as well.

Conclusion

The organizations put more emphasis on science and engineering efforts in isolation, leaving the gap wide open between them. While organizations can delay the foundation of ML library until they gather sufficient knowledge, it looks like the role of MLE and well-structured path to production is necessary from day one. I must acknowledge that a few important aspects were excluded in my oversimplified version of ML flow. Those were omitted just to stay focused on my proposition. Data is knowledge minefield, so data flow and data abstraction play equally important role, if not more. But that is a topic for another day.

Rapid Deep Learning: The Bridge We Forgot to Build

Path To Production: Role of MLE

Model Flow: Training to Retraining

Model Library: Transfer Learning in Action

Conclusion

Written by Son of Soil