Skip to main content

ClearML: the open-source machine and deep learning solution


“I don’t give you my data, but I give you the value of it.” – Dror Bar-Lev

On 22 July, the Support Centre for Data Sharing was joined by Dror Bar-Lev, Sales & Business Development Director at Allegro AI, for a practice example interview. As a provider of operational tools, Allegro AI helps companies manage their machine and deep learning products and bring their solutions to market faster and more effectively.

One of the products of Allegro AI is ClearML, an open-source tool for deploying and maintaining machine learning models in production. ClearML is a unified platform that facilitates collaboration, joint experiment management, and easy model deployment.

What sparked the idea of ClearML?

The idea of ClearML is to give data engineers, scientists, product managers, IT people, practically everyone working with a machine learning or deep learning model, the tools to do so.

“We all know that data is key, but we need a way to produce value with that data”.

When ClearML was set up, the team realised that not everyone was willing to share data. Therefore, the system was designed for what Bar-Lev calls “zero data move”. This means that the data remains where it is stored, secured, and its privacy is safeguarded. The models that the data scientists build with that data are executed in a different space, meaning the design is decentralised. 

The fact that the data does not move, and thus remains in the hands of the provider, is particularly important for industries that have sensitive data like health care, or industries that are highly competitive like aviation, or autonomous systems and robotics. Though there are cases where data is shared directly, for some of the data it makes more sense to share the insights only.

Another example of a scenario where this is useful is camera surveillance. In Israel or the UK for example, there are many cameras throughout cities. The information that lies in this data is useful, not the vast amount of data itself.

How does it work to share insights but not the data using ClearML?

ClearML offers three pieces of software:

  • Data scientist environment: where that data scientist, engineer, and product manager are working. They have the rights to use certain data and create Python code here.

  • GPU machines: where the crunching happens, either with GPU, CPU, cloud, edge, or a hybrid.

  • Experiment manager: where the management and monitoring take place. Here the data tracking is performed, metadata and hyperparameters can be accessed, and the results of the experiments are shown.

By separating these spaces, in most cases, the person managing only has access to the hyperdata and the model, not the data itself. The benefit of this becomes clear in the example from oncology that Dror Bar-Lev mentions. One of oncologists’ tasks is the interpretation of images. The images, i.e., the data, remain in the hospital, on-device or a local server, but the model is created outside that space. The monitoring software can be located on any server, eliminating the need for data access. That way you can work with the description of the data and the hyperparameters, where the true value of the data lies. You can work on the models collaboratively with others who have another expertise or have their own data for example.

This eliminates the problem of having to duplicate data or transferring data. According to Bar-Lev, working with the data in this way is not only more efficient, but also a pre-condition for a trusting relation.

How does ClearML distinguish itself from others?

ClearML supports deep learning and the use of unstructured data, though they also support structured data. Most of the data their system processes are images, videos, natural language processing (NLP) files, audio or text files, and radar, lidar, MRI, or X-RAY images. They predominantly serve the health care sector but also have clients in computer vision, robotics, automotive, drones, autonomous systems, retail, media, and finance.

What is the difference between start-ups and large corporations using ClearML?

In the early years, ClearML mostly saw big corporations looking for machine learning solutions, as they had most technological resources. But Bar-Lev believes start-ups can hop on the bandwagon just as well. Every start-up can use ClearML, and they have the advantage of being able to develop with greet speed, a factor that corporations often lack.

“I definitely see that start-ups are going to be one of the biggest customers for the European data sharing program because they simply don’t have the money or the access, resources, or reputation to bring to that. This is a huge advantage to them. For huge organisations it will be the opposite, they need to find out how to move faster, with huge amounts of data.”

Which developments currently lead the field?

One of the most inspiring projects for Bar-Lev is Theator, an AI workflow automation project for surgeons. By using computer vision in the middle of a surgery, doctors have another set of eyes and can leverage data to make smarter decisions, sharpen their skills, and create better outcomes.

“At the crossroads of computer vision, surgery, and artificial intelligence, Theator’s data science team requires extensive computing resources to build, test, and refine their models to incorporate every permutation of each type of surgery they aim to improve.”1

When asked what Allegro AI’s aim for the next 5 years would be, Bar-Lev shared that the company hopes to keep enabling machine learning and deep learning at an entry level, to be more involved in federated use cases, and to be at the heart of data sharing developments.

From experience, Bar-Lev notices that the United States and Europe are both looking to set up projects and ecosystems to enable data sharing, but they are not aware of each other (yet).

“Will we have two systems, a European and an American, or can bridges be built, and can we reach some kind of synthesis? “I’m very excited about the developments in Europe. Each and every one of us is generating data constantly, and there has to be a way in which we can participate in this value generation.” – Dror Bar-Lev

Listen to the full interview to get all the insights.

Name 

Clear ML

Sector 

Technology

Region 

Global

Countries 

Israel, Canada

Time 

2006 - ongoing 

URL 

https://clear.ml/ | https://www.allegro.ai/

Business model  

Operational tool provider for machine and deep learning solutions

Participants 

Dror Bar-Lev

Type of organisation 

Commercial

Data sharing model(s)  

Any

Core impact

ClearML provides software for companies to manage their machine and deep learning solutions in a decentralised environment, leaving the data in the hands of the provider, but enabling others to gain insights from the data.

Context

ClearML is an open-source tool for deploying and maintaining machine learning models in production. ClearML is a unified platform that facilitates collaboration, joint experiment management, and easy model deployment. Though mostly serving the health care sector, the company also has clients in computer vision, robotics, automotive, drones, autonomous systems, retail, media, and finance.

 


For questions and comments, please visit our forum on Futurium.