<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">ISPRS-Annals</journal-id>
<journal-title-group>
<journal-title>ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences</journal-title>
<abbrev-journal-title abbrev-type="publisher">ISPRS-Annals</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2194-9050</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/isprs-annals-X-1-W1-2023-439-2023</article-id>
<title-group>
<article-title>MONO-HYDRA: REAL-TIME 3D SCENE GRAPH CONSTRUCTION FROM MONOCULAR CAMERA INPUT WITH IMU</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Udugama</surname>
<given-names>U. V. B. L.</given-names>
<ext-link>https://orcid.org/0000-0002-1932-692X</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Vosselman</surname>
<given-names>G.</given-names>
<ext-link>https://orcid.org/0000-0001-8813-8028</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Nex</surname>
<given-names>F.</given-names>
<ext-link>https://orcid.org/0000-0002-5712-6902</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands</addr-line>
</aff>
<pub-date pub-type="epub">
<day>05</day>
<month>12</month>
<year>2023</year>
</pub-date>
<volume>X-1/W1-2023</volume>
<fpage>439</fpage>
<lpage>445</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2023 U. V. B. L. Udugama et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.html">This article is available from https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.html</self-uri>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf">The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/X-1-W1-2023/439/2023/isprs-annals-X-1-W1-2023-439-2023.pdf</self-uri>
<abstract>
<p>The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships. However, building these representations using monocular vision systems in real-time remains a difficult task that has not been explored in depth.&lt;br /&gt;This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios. However, the proposed approach is adaptable to outdoor applications, offering flexibility in its potential uses. The system employs a suite of deep learning algorithms to derive depth and semantics. It uses a robocentric visual-inertial odometry (VIO) algorithm based on square-root information, thereby ensuring consistent visual odometry with an IMU and a monocular camera. This system achieves sub-20 cm error in real-time processing at 15 fps, enabling real-time 3D scene graph construction using a laptop GPU (NVIDIA 3080). This enhances decision-making efficiency and effectiveness in simple camera setups, augmenting robotic system agility. We make Mono-Hydra publicly available at: https://github.com/UAV-Centre-ITC/Mono_Hydra.</p>
</abstract>
<counts><page-count count="7"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>