<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">ISPRS-Annals</journal-id>
<journal-title-group>
<journal-title>ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences</journal-title>
<abbrev-journal-title abbrev-type="publisher">ISPRS-Annals</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2194-9050</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/isprs-annals-X-3-2024-387-2024</article-id>
<title-group>
<article-title>Deep-Sea Fauna Segmentation: A Comparative Analysis of Convolutional and Vision Transformer Architectures at Lucky Strike Vent Field</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Soto Vega</surname>
<given-names>Pedro J.</given-names>
<ext-link>https://orcid.org/0000-0001-5396-8531</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Andrade-Miranda</surname>
<given-names>Gustavo X.</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>da Costa</surname>
<given-names>Gilson A. O. P.</given-names>
</name>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Papadakis</surname>
<given-names>Panagiotis</given-names>
</name>
<xref ref-type="aff" rid="aff6">
<sup>6</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Matabos</surname>
<given-names>Marjolaine</given-names>
</name>
<xref ref-type="aff" rid="aff7">
<sup>7</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Napoleon</surname>
<given-names>Thibault</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Karine</surname>
<given-names>Ayoub</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Fagundes Gasparoto</surname>
<given-names>Henrique</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>L@bISEN, Vision-AD, ISEN Yncréa Ouest, 20 rue Cuirassé Bretagne, 29200 Brest, France</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>L@bISEN, Vision-AD, ISEN Yncréa Ouest, 33 Quater Chemin du Champ de Manoeuvre, 44470 Carquefou, France</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>L@bISEN, AutoROB, ISEN Yncréa Ouest, 20 rue Cuirassé Bretagne, 29200 Brest, France</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>University Brest, LaTIM, INSERM, UMR 1101, Brest, France</addr-line>
</aff>
<aff id="aff5">
<label>5</label>
<addr-line>Institute of Mathematics and Statistics, State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil</addr-line>
</aff>
<aff id="aff6">
<label>6</label>
<addr-line>IMT Atlantique, Lab-STICC, UMR 6285, Team RAMBO, F-29238 Brest, France</addr-line>
</aff>
<aff id="aff7">
<label>7</label>
<addr-line>University Brest, CNRS, Ifremer, UMR6197 Biologie et Ecologie des Ecosystèmes marins Profonds, 29280 Plouzané, France</addr-line>
</aff>
<pub-date pub-type="epub">
<day>04</day>
<month>11</month>
<year>2024</year>
</pub-date>
<volume>X-3-2024</volume>
<fpage>387</fpage>
<lpage>395</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2024 Pedro J. Soto Vega et al.</copyright-statement>
<copyright-year>2024</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/X-3-2024/387/2024/isprs-annals-X-3-2024-387-2024.html">This article is available from https://isprs-annals.copernicus.org/articles/X-3-2024/387/2024/isprs-annals-X-3-2024-387-2024.html</self-uri>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/X-3-2024/387/2024/isprs-annals-X-3-2024-387-2024.pdf">The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/X-3-2024/387/2024/isprs-annals-X-3-2024-387-2024.pdf</self-uri>
<abstract>
<p>Due to recent technological developments, the acquisition and availability of deep-sea imagery has increased exponentially in the last years, leading to an increasing backlog in image annotation and processing, attributable to limited specialized human resources. In this work, we investigate the performance of well-established convolutional neural networks and Vision Transformer (ViT) based architectures, namely, DeepLabv3+ and UNETR, for the segmentation of fauna in deep-sea images. The dataset consists of images captured at the Lucky Strike Vent field, located on the mid-Atlantic ridge, of three edifices named Montsegur, White Castle, and Eiffel Tower. Our experimental investigation reveals that the Vision Transformer consistently outperforms the fully convolutional deep learning architecture, by approximately 14% in terms of F1-Score, demonstrating the effectiveness of ViTs in capturing intricate patterns and long-range dependencies present in deep-sea imagery. Our findings highlight the potential of ViTs as a promising approach for accurate semantic segmentation in challenging environmental contexts, paving the way for improved understanding and analysis of deep-sea ecosystems.</p>
</abstract>
<counts><page-count count="9"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>