LGFormer: Lightweight Local-Global Transformer for Indoor Point Cloud Segmentation
Keywords: Point Cloud, Semantic Segmentation, Transformer, Graph Convolution Network
Abstract. Semantic segmentation of indoor point clouds is a fundamental task in 3D scene understanding, supporting applications such as virtual reality, indoor navigation, and building management. Point-based transformer models achieve high accuracy but require substantial computational resources, while superpoint-based methods are more efficient yet often less precise. To address this trade-off, we propose LGFormer, a lightweight framework that integrates Graph Convolutional Networks (GCN) and transformers to jointly capture local and global contextual features. The method constructs a superpoint-based topology graph, where local features are extracted using GCN and global dependencies are modeled through transformer layers. Experiments on the S3DIS and ScanNet++ datasets demonstrate that LGFormer achieves 90.7% and 88.5% segmentation accuracy, respectively, while reducing inference time by more than 99% compared with point-based transformers. By effectively leveraging superpoints and local-global feature fusion, LGFormer delivers competitive accuracy with significantly lower computational cost, making it suitable for large-scale indoor scene analysis.
