José Luis Lisani Roca and Ignacio Catalán Alemany
To plan strategies for the sustainable management of marine resources, it is necessary to understand their operation. In the case of fish populations, we need to know, among other variables, their location, abundance and fluctuations.
At present the acquisition of data on these resources is carried out in different ways. One of them is the analysis of underwater images, which is usually based on the identification and manual counting of the species in thousands of images by specialized personnel, which implies the investment of an enormous amount of time and effort.
Automation of the data collection process would allow the massive extraction of information with considerable savings in human resources, allowing researchers to spend more time analyzing the results. In addition, the increase in the volume of available data would facilitate a more precise and statistically relevant analysis. This automation involves teaching computers to seeing the fish in the images.
Artificial intelligence to recognize objects
They say that Marvin Minsky ( MIT), one of the fathers of artificial intelligence proposed in 1966 to one of his students a summer project consisting of connecting a computer to a camera and getting him to describe what he saw. This project, planned for 3 months, has lasted more than 50 years. Only in the last 8 has significant progress been made.
Until 2012, the fact that computers could recognize the objects in a scene was more science fiction than a real possibility. Although some progress had been made, it was limited to very particular cases (eg face recognition) and for simple images. As of that year, with the appearance of machine learning techniques based on deep convolutional neural networks reality surpassed fiction.
An artificial neural network is an algorithm composed of several interconnected stages between yes and called neurons . This model of connection is inspired by the way in which neurons in the brain are related, hence its name.
Each artificial neuron implements a mathematical function that combines a series of simple operations (sums and products of the input values by factors or weights associated with the neuron) and a more complex operation that is applied to the output signal.
In a neural network, neurons are organized into layers, so that the outputs of the neurons of one layer are used as input to the neurons of the next layer.
The concatenation of many of these layers allows creating very complex functions that relate the input values of the network to the value (or values) to the departure. Using optimization techniques, the weights of the network can be adjusted (the algorithm learns) to obtain a result adapted to each input.
Although the theoretical basis of neural networks was established at In the middle of the last century, it was not until the beginning of the present century when computing power made it possible to process the large amount of data necessary to solve complex problems with this type of algorithm.
How artificial neurons are trained
The most common network model applied to image processing is called the convolutional neural network (or CNN, for its acronym in English). In this case, each neuron in the first layer of the network is connected to a small group of pixels in the input image.
One of the first applications of CNN was the classification of images according to their content. Given an input image, the network must decide, for example, whether it is the image of a person, a car, etc. To adjust the weights of the network ( train it ) in such a way that this objective is met, the following ingredients are necessary:
- A large number of images, training calls, containing the objects to be recognized and labeled by a human (images of people with the label "person", of cars with the label "car", etc.).
- A network that takes an image as input and outputs a label ("person", "car", etc.).
- A function (cost function) that compares the labels provided by the network with the labels assigned by the human and takes a minimum value when both match.  The pesos of the network are modified in the process. If the number of training images and the number of layers in the network are large enough, after a sufficient number of iterations the network is able to simulate the way humans have to label images.
In 2012, a CNN deep (made up of a large number of layers)
called AlexNet was capable of classifying 1 000 different objects with a much lower error than any previous technique. This fact definitely drove the use of this type of algorithm in the field of computer vision. Since 2015, CNN has been able to classify these 1 000 objects with an error lower than that made by humans.
Based on the principles stated above, increasingly complex networks have been applied since 2012 to the recognition of objects in images: the network had to not only distinguish one object from another, but also indicate where it was in the image. The most popular network model today for solving this type of problem was proposed in 2018 and is called Mask R-CNN .
Artificial intelligence to identify fish
Mask R-CNN has been used to detect a multitude of objects in everyday life, from cars and people to ties, chairs or toothbrushes. We used it in the DEEP-ECOMAR project with the aim of recognizing different species of fish in underwater images.
To achieve this, we will train the network with thousands of images previously tagged by experts in which the species of interest. Once
trained, the network will be able to identify these species automatically.
An important part of the project will be dedicated to manual labeling of images, for which software tools will be developed to speed up the task. Likewise, the effect of the application of techniques to improve the color and contrast of images on learning outcomes will be investigated. Finally, the parameters of the network cost function will be adjusted to obtain optimal results applicable to images obtained in marine environments other than the one used for training.
The DEEP-ECOMAR project was carried out jointly by IMEDEA researchers ( Mediterranean Institute for Advanced Studies, CSIC-UIB) and the University of the Balearic Islands (UIB). We will use the underwater video and the image bank of the Sub-Eye underwater observatory located in Andratx (Mallorca).
About the authors: José Luis Lisani Roca is a tenured professor of university in the area of applied mathematics at the Universitat de les Illes Balears and Ignacio Catalán Alemany is a researcher in fisheries oceanography at the Mediterranean Institute for Advanced Studies (IMEDEA – CSIC – UIB)
The article This is how we teach computers to identify fish species has been written in Cuaderno de Cultura Científica .