Opinions

A Look at Machine Learning System Design

0

As ML applications are maturing over time and becoming an indispensable component of industries for making faster and accurate decisions for critical and high-value transactions.

For example,

1. Recommendation system to increase click-through rate for e-commerce.

2. Increasing engagement time users for social media apps.

3. Applications in the medical field and self-driving cars.

In all the above scenarios, the expected ML response is accurate, fast, and reliable. Scalability, maintainability, and adaptability also become critical as we move towards making ML one of the main components of enterprise-level applications. Hence, designing end to end system keeping requirements of ML becomes important.

 The process of defining an interface, algorithm, data infrastructure, and hardware for ML Learning systems to meet specific requirements of reliability, scalability, maintainability, and adaptability.

Reliability

The system should be able to perform the correct function at the desired level of performance under a specified environment. Testing for the reliability of ML systems where learning is done from data is tricky as its failure does not mean it will give an error, it may simply give garbage outputs i.e. it may produce some outputs even though it does not have seen that corresponding ground truth during training.

The normal system fails: you get an error message e.g. There is some technical issue, the team is working on, will come back soon…

ML system fails: It fails silently, e.g. sometimes in language translation system (English to Hindi or vice versa), even though the model has not seen some words, it still might give some translation, which does not make sense.

 

Scalability

As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth. There should be automatic provision to scale compute/storage capacity because for some critical applications even 1 hour of downtime/failure can cause in loss of millions of dollars/credibility of the app.

As an example, if for an e-commerce website on a prime day, if certain feature failure i.e. not working as expected, that may turn into revenue loss in order of millions.

Maintainability

Over the time period, data distribution might get change, and hence model performance. There should be a provision in the ML system to first identify if there is any model drift/data drift, and once the significant drift is observed, then how to re-train/re-fresh and enable updated ML model without disturbing the current operation of the ML system.

Adaptability

other changes that most frequently comes in the ML systems are the availability of new data with added features or changes in business objectives e.g. conversion rate vs engagement time from customers for e-commerce. Thus, the system should be flexible to quick updates without any service interruptions

Regulatory sandbox and DeFi boom: How Spain pushed crypto adoption despite the pandemic

Previous article

ECB In ‘Currency War’ Over Euro, Commerzbank Strategist Says

Next article

You may also like

Comments

Comments are closed.

More in Opinions