In Detail: The Difference Between Machine Learning and General Purpose Computing Infrastructure and Why Super Protocol is Doing Both
Most of us (not to be confused with the currently trending HBO series Last Of Us) are well familiar with one of the key functions of cloud services which are hosting web services and providing computational power to process user requests to the said services. We mean requests here in a general sense: not just load a static page, but send the user’s data to the service, manipulate it, and provide the user with some kind of response.
Function defines form and for a long time processing frontend and backend data for a variety of applications while helping them scale has been the key purpose for the cloud (you can find more about it here). Machine learning is fairly new to this game (well, the theory for ML algorithms has been there for almost 60 years, yet we lacked the computational power and the massive amounts of data required to test them). Cloud providers such as AWS are only now adding features required to spin machine learning pipelines (pre-made personalization algorithms for e-commerce, or ML-tailored databases known as feature stores).
Let’s dive a bit deeper, what is required to run ML algorithms and is it that different from serving a web page?
First, what an ML pipeline should do (we’ll use plain English so that we omit the risk of running into the specifics of professional vocabulary):
- store and retrieve vast amounts of historical data — used to train the algorithm;
- receive and process events just as they happen (for example, user-generated actions like click, view, add to cart, etc.) — used to update results based on the new information (popular scenario for e-commerce when you’re getting fresh recommendations based on how you’re browsing the catalog);
- capacity to train ML models some of which can be small while others (multi-layered neural networks) are quite heavy
- be able to retrain models on the fly with new data input;
- apply filters to the results that the model provides (for example, the model gave us a list of goods, but we want to show only those currently in stock) — business rules is a common term;
- allow developers to test various ideas simultaneously (A/B test hypothesis) — to compare the results of different algorithms and/or business rules and find out which combination works better;
- help developers set up complex pipelines (might include multiple steps: take data — train — apply business rules — load in production — retrain on new input, also, branch given on some condition).
As you can see, there are a number of specific tasks, tools, and functions a platform should be able to provide (or host a third-party tool that covers some of them).
While a typical application workflow might look similar:
- store and retrieve vast amounts of data (users’, catalogs, etc.);
- manipulate that data and process complex requests on the fly;
- capable to scale based on the current workload;
And indeed, some of the ML tasks can be done with general computing, most of the developers’ efforts would be wasted on adjusting and re-purposing the infrastructure, rather than discovering new insights. We remember that the main purpose of cloud platforms is to help developers move fast, save resources, and build better products, right?
Last but not least, ML platforms require higher security and data protection policies. If a typical website gets hacked, a handful of logins, email, and passwords gets leaked, but machine learning data could contain more detailed personal information.
So when it comes to showing the results to the user, the ML algorithm is just another list of smth (recommended movies for the evening, or list of best deals on electronics) a website engine can take from its usual database, generating this list and improving the results based on the user’s feedback is the hardest part that requires a specific set of tools, platform resources, and infrastructure.
In this regard, Super Protocol has several key advantages that make both tasks (ML and general computing) possible:
- security-first approach that ensures data protection in all its three states;
- a network of services and resources providers that covers different developers’ needs, some of which can be quite specific;
- a community and an open ecosystem enabling a variety of tools made ready-to-use on the platform without any additional hustle.
It is no problem to provide a great service capable of achieving many tasks if you have the right foundation to build upon.