Altaroad is the winner of the Girls in Tech Amplify Paris 2018 competition.  Altaroad is a big data startup which puts tiny nanosensors into roads with the aim to make tomorrow’s roads as intelligent as today’s cars.   It has been named by Challenges Magazine as one of the 100 startups to invest in 2019.   Girls in Tech Paris is happy to send Altaroad to attend the Big Data Paris conference to act as our eyes and ears on the new developements in the field.

The following article was written by: Maëlle Buisson, Data science lead, Altaroad

Most of the big companies aim to manage their Big Data projects with artificial intelligence as a “start-up company”. In the Big Data Paris convention, some of these big companies explain that making machine learning models represents 10% of their project activity, collecting and analyzing data is 20% and all the remaining work consists in integration and team acceptance of the new product.  I think it is interesting to notice that it is a bit different for our start-up company, Altaroad.  For us, the model still represents only 10% of the work – usually new graduates hope to mainly work on model optimization so they need to keep in mind that it is only 10% of the machine learning project for most companies – regardless of the company size. However, one of the advantages of working for a start-up company is that product integration and its team acceptance represent only 40% of our work. This is mainly due to the fact that we don’t have to deal with an existing architecture and our small teams are more flexible to change. This leaves 50% of our work in Altaroad to analyze and collect data. I see this as a clear benefit of working there, as this is where we can use our intelligence and have an added value.

In this domain we face two main challenges: the data volume and the data quality.  The amount of data is a double challenge for us.

Challenge 1:  during the passage of a vehicle, we collect a very large volume of data (over one million data points). We need to process this data in real time before sending it to our machine learning algorithm. In a mass spectrometry conference during the convention, it was interesting to notice that other companies face a similar challenge. One of their tricks to process a very high amount of data in Spark in real time was to convert their image in text.

Challenge 2: the more data we get as input for our machine learning algorithm, the more we need vehicle passages to train it. Getting enough data to train our algorithm is a real challenge. I discovered during the convention that some big companies use an army of employees in India in order to label their data. In a start-up company, this is not possible so we need to get creative in order to automate the data labeling. At Altaroad, we installed our solution in the field in parallel of an existing measurement system and we automatized the labeling using a license plate recognition system.

Finally, the quality of the collected data is critical for Altaroad. It was clear during the congress that the data quality is one of the most important challenges for companies starting Big Data projects. Most of the conferences were on this subject and how to automatize data cleaning. For our part, in order to have high quality data, our sensor signals need to be repeatable over time. It means that our algorithms need to take into account of the inevitable sensor ageing. For this reason, we performed fatigue testing on FABACS machines that simulated the passage of three hundred thousand wheels on our sensor solution in one week. This is equivalent to more than one year’s traffic on a construction field in real life condition.

In conclusion, the Big Data Paris convention made us realize that small as we are, we have already cleared some of the biggest challenges faced by most machine learning projects, and that the challenges that we still face are the biggest ones in the industry, faced by the majority of enterprises represented in the convention… big companies or start-up alike. When I was walking between the stands, I also realized that Altaroad has already deployed the latest available technologies in terms of architecture and integration: cloud and containers – to make a very simplistic summary. Finally, BCG presented a study during the conference showing that the more companies have expertise in the Big Data and machine learning field the easier it is for them to use it. So the gap between companies able to use machine learning and the other is constantly growing. It is good news for Altaroad that started using machine learning since its creation!