Pilot project 3 – Autonomous shipping technology supported by AI

CHALLENGE! To build a model for the automatic detection of small objects at sea and estimation of the height and direction of waves propagation, i.e. the sea state estimation.

HOW? Computer Vision utilization

WHY? To provide the necessary information in building the deep learning autonomous shipping and safe navigation model.

FINAL RESULT→ CV models for small objects detection and sea states description.

GOALS FOR INNO2MARE PROJECT: To assist in creating the autonomous ship DSS.

Progress

Experiments and actions on the pilot projects 3 so far:

1. Development of Sea State Classification Models

The key objectives of our pilot project include construction of algorithms and adaptation of artificial intelligence models that will contribute to navigational safety, assist in decision-making during navigation and ultimately contribute to autonomous navigation capabilities.

The project focuses on two specific computer vision tasks:

– recognition of sea state conditions captured by a stationary camera

– detection of small objects at sea that are undetectable by radar or standard sensors.

For the development of computer vision models, collecting real training data was crucial, but we also synthesized data to increase the quantity of samples especially for roughter sea states that are more difficult to obtain.

Below are the key activities that we have carried out so far through the project for building a Sea State Recognition Model:

  • Data Collection and Analysis: We collected high-definition video data during three voyages on oceangoing vessels. This data covered multiple geographic locations and included related sensor signals such as wind speed, angle, heading, and speed over ground. Video data of different sea states was essential for training the sea state classification models.
  • Manual Estimation of Sea States: We manually estimated the sea state in the videos based on the Beaufort and Douglas scales, including wave heights and angles. These estimates were used to annotate the data needed for model training.
  • Safety and Operational Knowledge: We consulted a marine expert to provide key insights into the operational envelope of the vessels. This ensured that our research targeted the maximum sea state conditions that ships regularly encounter.
  • Development of Sea State Classification Models: We selected deep neural network architectures suitable for building a sea state recognition model. After testing various architectures with different hyperparameter settings on our test set, we chose the appropriate ones. We then fine-tuned the models from pretrained weights and build a CV model based on expert input.

A maritime expert helped identifying the gaps in the current maritime operations, particularly where traditional methods of sea state estimation are either infeasible or inaccurate. This feedback contributed to the development of a novel approach using computer vision (CV) for real-time sea state recognition.

  • Assessment of Model Performance: we evaluated the models for their accuracy and practical usability in real-world maritime environments.

2. Building synthetic images of different sea states

We developed the SeaStateSynth method for generating and fine-tuning adaptive maritime synthsets for sea state classification and built the accompanying pipeline with 7 interconnected modules:

1. Optical calibration – for matching the focal distance, crop position and area of synthesized and real images;
2. Waves generator – fine-tuned for generating 8 Beaufort classes (1-8), outputting waves spectra for each class;
3. Foam simulator – particle simulation based on waves spectra, outputting foam particles;
4. Reference object motion simulator – for adding swimmers, boats, and ships into the 3D scene and calculating their motion according to the waves, used as a visual size reference while fine-tuning the waves generator;
5. Lighting – cloudless physical sky, fine-tuned for different sun heights and solar azimuths, used as a light source for rendering;
6. XPU rendering – a combination of CPU and GPU parallelized pipelines able to render each synthesized image in 1.8 s;
7. Classification – integrated classifier trained on real data and used for evaluation of synthesized images.

By applying the described method, we generated 3 synthsets. Each one contains 51,840 images divided into 8 balanced classes and differs from the others by visual characteristics of the waves influenced by the distance from the shore (3, 30, and 300 km). The final synthset (UNIRI-SeaState-S) is composed of 150 randomly selected images, using 30 km images for lower sea states (1-3 Beaufort) and 300 km images for the higher ones (4-8 Beaufort), based on best-achieved classification results.

 

Building synthetic images for small Object detection

Building upon the SeaStateSynth pipeline, we included 5 additional modules to synthesize richly annotated images containing small objects in the sea:

1. Voxelizer – for evaluating 3D geometry in 10-32 px resolution;
2. Mo-cap retarget – for animating swimmers using mo-cap data;
3. Shipyard – procedural generator of different sea vessels based on 92 parameters;
11. Instancer – for efficient positioning of small objects in the image;
12. Annotator – Cryptomatte masks and depth maps generator, outputting rich annotations.

We are currently fine-tuning the entire pipeline to produce a synthset for small object detection.

Publications

Your Title Goes Here

Your content goes here. Edit or remove this text inline or in the module Content settings. You can also style every aspect of this content in the module Design settings and even apply custom CSS to this text in the module Advanced settings.

Estimation of sea state parameters from ship motion responses using attention-based neural networks

Denis Selimović, Franko Hržić, Jasna Prpić-Oršić, Jonatan Lerga, Estimation of sea state parameters from ship motion responses using attention-based neural networks, Ocean Engineering, Volume 281, 2023.
https://doi.org/10.1016/j.oceaneng.2023.114915. (https://www.sciencedirect.com/science/article/pii/S0029801823012994)

Abstract: On-site estimation of sea state parameters is crucial for ship navigation. Extensive research has been conducted on model-based estimation utilizing ship motion responses. Model-free approaches based on machine learning (ML) have recently gained popularity, and estimation from time-series of ship motion responses using deep learning (DL) methods has given promising results. In this study, we apply the novel, attention-based neural network (AT-NN) for estimating wave height, zero-crossing period, and relative wave direction from raw time-series data of ship pitch, heave, and roll. Despite reduced input data, it has been demonstrated that the proposed approaches by modified state-of-the-art techniques (based on convolutional neural networks (CNN) for regression, multivariate long short-term memory CNN, and sliding puzzle neural network) improved estimation MSE, MAE, and NSE by up to 86%, 66%, and 56%, respectively, compared to the best performing original methods for all sea state parameters. Furthermore, the proposed technique based on AT-NN outperformed all tested methods (original and enhanced), improving estimation MSE by 94%, MAE by 74%, and NSE by 80% when considering all sea state parameters. Finally, we proposed a novel approach for interpreting the uncertainty estimation of neural network outputs based on the Monte-Carlo dropout method to enhance the model’s trustworthiness.

Keywords: Ship motions; Sea state estimation; Deep learning; Attention neural network; Uncertainty estimation

Application of raycast method for person geolocalization and distance determination using UAV images in Real-World land search and rescue scenarios

Goran Paulin, Sasa Sambolek, Marina Ivasic-Kos, Application of raycast method for person geolocalization and distance determination using UAV images in Real-World land search and rescue scenarios, Expert Systems with Applications, Volume 237, Part A, 1 March 2024, 121495.

https://doi.org/10.1016/j.eswa.2023.121495. (https://www.sciencedirect.com/science/article/pii/S0957417423019978?via%3Dihub)

Abstract: People enjoy spending time in the wilderness for numerous reasons. However, they occasionally get lost or injured, and their survival depends on being efficiently found and rescued in the shortest possible time. A search and rescue operation (SAR) is launched after the accident is reported, and all possible resources are activated. The inclusion of drones in SAR operations has enabled the use of computer vision methods to detect persons in aerial imagery automatically. When searching by drone, preference is given to oblique photographs that cover a larger area within a single image, reducing the search time. Unlike vertical photographs, oblique photographs include a significant scale change, making it challenging to locate a person in the real world and determine their distance from the drone. In order to solve this problem, encouraged by our previous successful simulations, we explored the possibility of applying the raycast method for person geolocalization and distance determination for use in real-world scenarios. In this paper, we propose a system able to precisely geolocate persons automatically detected in offline processed images recorded during the SAR mission. After a series of experiments on terrains of different configurations and complexity, using a custom-made 3D terrain generator and raycaster, along with a deep neural network-based person detector trained on our custom dataset, we defined a method for geolocating detected person based on raycast, which allows using low-cost commercial drones with a monocular camera and no Real-Time Kinematic module while enabling laser rangefinder emulation during offline image analysis. Our person geolocating method overcomes the problems faced by previous methods and, using a single flight sequence with only 4 consecutive detections, significantly outperforms the previous best results, with reliability of 42,85% (geolocating error of 0.7 m on recording from a 30 m height). Also, a short time of only 247 s enables offline processing of data recorded during a 21-minute drone flight covering approximately an area of 10 ha, proving that the proposed method can be effectively used in actual SAR missions. We also proposed a new evaluation metric (ErrDist) for person geolocalization and provided recommendations for using the proposed system for person detection and geolocation in real-world scenarios.

Keywords: Raycasting; Drone imagery; Object detection; YOLOv4; Object geolocalization; Distance determination; Search and rescue missions

Detection of motor imagery based on short-term entropy of time-frequency representations

Luka, Batistić; Jonatan, Lerga; Isidora, Stanković , Detection of motor imagery based on short-term entropy of time-frequency representations, BioMedical Engineering OnLine volume 22, Article number: 41 (2023)

https://doi.org/10.1186/s12938-023-01102-1

Abstract:

Motor imagery is a cognitive process of imagining a performance of a motor task without employing the actual movement of muscles. It is often used in rehabilitation and utilized in assistive technologies to control a brain–computer interface (BCI). This paper provides a comparison of different time–frequency representations (TFR) and their Rényi and Shannon entropies for sensorimotor rhythm (SMR) based motor imagery control signals in electroencephalographic (EEG) data. The motor imagery task was guided by visual guidance, visual and vibrotactile (somatosensory) guidance or visual cue only.

When using TFR-based entropy features as an input for classification of different interaction intentions, higher accuracies were achieved (up to 99.87%) in comparison to regular time-series amplitude features (for which accuracy was up to 85.91%), which is an increase when compared to existing methods. In particular, the highest accuracy was achieved for the classification of the motor imagery versus the baseline (rest state) when using Shannon entropy with Reassigned Pseudo Wigner–Ville time–frequency representation.

Our findings suggest that the quantity of useful classifiable motor imagery information (entropy output) changes during the period of motor imagery in comparison to baseline period; as a result, there is an increase in the accuracy and F1 score of classification when using entropy features in comparison to the accuracy and the F1 of classification when using amplitude features, hence, it is manifested as an improvement of the ability to detect motor imagery.

Keywords: Brain–computer interface , Electroencephalography , Information entropy, Motor imagery, Movement detection , Time–frequency representations

Evaluating YOLOV5, YOLOV6, YOLOV7, and YOLOV8 in Underwater Environment: Is There Real Improvement?

Boris GašparovićGoran MaušaJosip RukavinaJonatan Lerga, Evaluating YOLOV5, YOLOV6, YOLOV7, and YOLOV8 in Underwater Environment: Is There Real Improvement?

DOI: 10.23919/SpliTech58164.2023.10193505

Published in: 2023 8th International Conference on Smart and Sustainable Technologies (SpliTech) 

Abstract:

This paper compares several new implementations of the YOLO (You Only Look Once) object detection algorithms in harsh underwater environments. Using a dataset collected by a remotely operated vehicle (ROV), we evaluated the performance of YOLOv5, YOLOv6, YOLOv7, and YOLOv8 in detecting objects in challenging underwater conditions. We aimed to determine whether newer YOLO versions are superior to older ones and how much, in terms of object detection performance, for our underwater pipeline dataset. According to our findings, YOLOv5 achieved the highest mean Average Precision (MAP) score, followed by YOLOv7 and YOLOv6. When examining the precision-recall curves, YOLOv5 and YOLOv7 displayed the highest precision and recall values, respectively. Our comparison of the obtained results to those of our previous work using YOLOv4 demonstrates that each version of YOLO detectors provide significant improvement.

Author Keywords:  object detection, yolov5, yolov6, yolo7, yolov8, comparison

A computer vision approach to estimate the localized sea state

Aleksandar Vorkapic, Miran Pobar, Marina Ivasic-Kos, A computer vision approach to estimate the localized sea state, Ocean Engineering , Volume 309, Part 1, 1 October 2024, 118318.

https://doi.org/10.1016/j.oceaneng.2024.118318

Abstract

This research presents a novel application of computer vision (CV) and deep learning methods for real-time sea state recognition, aiming to contribute to improving the operational safety and energy efficiency of seagoing vessels, key factors in meeting the legislative carbon reduction targets. Our work focuses on utilizing sea images in operational envelopes captured by a single stationary camera mounted on the ship bridge. The collected images are used to train a deep learning model to automatically recognize the state of the sea based on the Beaufort scale. To recognize the sea state, we used 4 state-of-the-art deep neural networks with different characteristics that proved useful in various computer vision tasks: Resnet-101, NASNet, MobileNet_v2, and Transformer ViT -b32. Furthermore, we have defined a unique large-scale dataset, collected over a broad range of sea conditions from an ocean-going vessel prepared for machine learning. We used the transfer learning approach to fine-tune the models on our dataset. The obtained results demonstrate the potential for this approach to complement traditional methods, particularly where in-situ measurements are unfeasible or interpolated weather buoy data is insufficiently accurate. This study sets the groundwork for further development of sea state classification models to address recognized gaps in maritime research and enable safer and more efficient maritime operations.

Keywords: Energy efficient shipping, Computer vision, Sea state recognition, Deep neural networks, Real-time monitoring

Interpretable Machine Learning: A Case Study on Predicting Fuel Consumption in VLGC Ship Propulsion

Aleksandar Vorkapić, Sanda Martinčić-Ipšić, Rok Piltaver, Interpretable Machine Learning: A Case Study on Predicting Fuel Consumption in VLGC Ship Propulsion, Journal of Marine Science and Engineering, 2024, 12(10),1849.

https://doi.org/10.3390/jmse12101849

Abstract

The integration of machine learning (ML) in marine engineering has been increasingly subjected to stringent regulatory scrutiny. While environmental regulations aim to reduce harmful emissions and energy consumption, there is also a growing demand for the interpretability of ML models to ensure their reliability and adherence to safety standards. This research highlights the need to develop models that are both transparent and comprehensible to domain experts and regulatory bodies. This paper underscores the importance of transparency in machine learning through a use case involving a VLGC ship two-stroke propulsion engine. By adhering to the CRISP-DM standard, we fostered close collaboration between marine engineers and machine learning experts to circumvent the common pitfalls of automated ML. The methodology included comprehensive data exploration, cleaning, and verification, followed by feature selection and training of linear regression and decision tree models that are not only transparent but also highly interpretable. The linear model achieved an RMSE of 23.16 and an MRAE of 14.7%, while the accuracy of decision trees ranged between 96.4% and 97.69%. This study demonstrates that machine learning models for predicting propulsion engine fuel consumption can be interpretable, adhering to regulatory requirements, while still achieving adequate predictive performance.

Keywords: interpretabilitymachine learningdecision treeslinear regressionfeature selectiontwo-stroke marine enginesfuel consumption

A Bayesian and Markov chain approach to short-term and long-term personal watercraft trajectory forecasting

A Bayesian and Markov chain approach to short-term and long-term personal watercraft trajectory forecasting

Lucija Žužić Ivan Dražić Loredana Simčić Franko Hržić Jonatan Lerga , A Bayesian and Markov chain approach to short-term and long-term personal watercraft trajectory forecasting, Journal of the Franklin Institute , January 2025.

https://doi.org/10.1016/j.jfranklin.2025.107509

(https://www.sciencedirect.com/science/article/pii/S0016003225000031) 

Abstract:

In this work, vessel position is estimated using a Bayesian approach based on heading, speed, time intervals, and offsets of latitude and longitude. An additional approach using a Markov chain is presented. The trajectory data comes from a cloud-based marine watercraft tracking system that enables remote control of the vessels. Wave height and meteorological reports were used to evaluate the impact of weather on personal watercraft trajectories. One proposed approach to trajectory estimation uses the longitude and latitude offsets, while another uses the speed, heading, and actual time intervals. A long-term forecasting window of up to ten seconds is achieved by dividing trajectories into segments that do not overlap. The limitation this method faces in long-term forecasting inspires more sophisticated machine-learning approaches. The most successful estimation method used one or two previous actual values and a Bayesian approach, proving that using previously predicted values in a chain accumulates errors. Considering environmental variables did not improve the model, highlighting that small watercrafts operate well even in unstable sea states. This occurs because they generate and ride waves, having a larger impact than oceanic currents.

Keywords

Personal watercraft, Trajectory forecasting, Markov chain

Skip to content