INDIVIDUAL TREE SEGMENTATION FROM BLS DATA BASED ON GRAPH AUTOENCODER

: In the last two decades, Light detection and ranging (LiDAR) has been widely employed in forestry applications. Individual tree segmentation is essential to forest management because it is a prerequisite to tree reconstruction and biomass estimation. This paper introduces a general framework to extract individual trees from the LiDAR point cloud based on a graph link prediction problem. First, an undirected graph is generated from the point cloud based on K-nearest neighbors (KNN). Then, this graph is used to train a convolutional autoencoder that extracts the node embeddings to reconstruct the graph. Finally, the individual trees are defined by the separate sets of connected nodes of the reconstructed graph. A key advantage of the proposed method is that no further knowledge about tree or forest structure is required. Seven sample plots from a plantation forest with poplar and dawn redwood species have been employed in the experiments. Though the precision of the experimental results is up to 95 % for poplar species and 92 % for dawn redwood trees, the method still requires more investigations on natural forest types with mixed tree species.


INTRODUCTION
Accurate forest inventory is decisive to forest management.Advanced remote sensing techniques particularly LiDAR have proven potential in forest mapping.The variety of LiDAR systems from airborne to personal laser scanners enables comprehensive forest interpretation due to the huge amount of gained information (Fekry et al., 2021).A key process for precise biomass estimation is individual tree segmentation.In general, tree segmentation from LiDAR data is divided into raster-based methods (Xu et al., 2021), point-based methods (Comesaña-Cebral et al., 2021;Yao et al., 2012), and multisource data fusion methods (Duncanson et al., 2014).
The basic principle of raster-based methods is to locate the tree location based on the local maximum from the canopy height model (CHM).Then, the tree crowns are delineated based on the region's growth.The main limitation of these methods is the information loss due to the projection of a three-dimensional (3D) point cloud and interpolation while creating the CHM.This means that under-story trees are not segmented particularly in a multi-layer forest.Moreover, the segmentation is initiated at predefined markers.Several tree segmentation methods were developed based on CHM generation.For instance, Zhen et al. (2014) developed a markercontrolled approach for tree crown delineation based on region growing with the treetops as seed points.Yang et al. (2020) segmented the individual trees by improving the watershed algorithm results using the 3D information from the point cloud.Xu et al. (2021) proposed a crown morphology-based tree detection approach from unmanned aerial vehicle laser scanning (ULS) data in an urban forest of mixed broadleaf trees.
In contrast, the point-based methods operate directly on the 3D point cloud then, the information loss is mitigated.The point-based methods involve different implementations in- * Corresponding author cluding converting point clouds to voxels (Wang et al., 2008), clustering such as mean shift (Hui et al., 2021) and densitybased spatial clustering of applications with noise (DBSCAN) (Fu et al., 2022), conditional euclidean distance (Shendryk et al., 2016) and graph-based segmentation (Yao et al., 2012).There are numerous methods depending on point-based tree segmentation approaches.Li et al. (2012) introduced a distancebased approach in a conifer forest.Hosoi et al. (2013) generated the 3D tree model in a mixed Japanese plantation based on point cloud voxelization.Moreover, a bottom-to-top method (Lu et al., 2014) was developed for deciduous tree segmentation based on intensity.Dai et al. (2018) used mean shift segmentation to extract individual trees from multi-spectral ALS data.Hui et al. (2021) presented a self-adaptive bandwidth of mean shift algorithm such that the single trees were gradually extracted based on hierarchy mean shift.Fu et al. (2022) proposed single-segmented trees from terrestrial laser scanning (TLS) data using a bottom-up approach based on DB-SCAN by first detecting the trunks.Yao et al. (2012) performed tree segmentation by employing the normalized cut of a graph composed of voxels for less computational cost.However promising results of point-based methods, the segmentation results are significantly affected in the case of forests with multiple tree species.In addition, they demand high requirements due to the LiDAR point density.
Moreover, advanced machine learning models such as deep learning have been utilized to detect individual trees from LiDAR data.For example, Jin et al. (2018) utilized Faster R-CNN to detect maize trees from TLS data, while Chen et al. (2021) used PointNet to detect individual tree crowns.Luo et al. (2022) employed a multi-channel representation to detect individual trees from ULS, and Ying et al. (2021) introduced a pointCNN-based approach for tree detection using a CHM from LiDAR to generate rough seed points for extracting the detection samples.However, deep learning-based methods often require converting 3D point clouds to 2D raster images, which may result in the loss of important 3D information.Additionally, annotating the data can be challenging.
In summary, the limitations of raster-based methods for tree segmentation include (1) 3D information loss, which can result in the misidentification or omission of understory trees; and (2) segmentation initiation at predefined markers, which can lead to errors in tree location and segmentation.Pointbased methods also have drawbacks, including high computational requirements, and sensitivity to noise and outliers.Both raster-based and point-based can face difficulties in situations where tree crowns are complex or overlap with each other, especially in mixed forest environments.Furthermore, existing tree segmentation methods that use deep learning algorithms necessitate significant amounts of labeled training data, which can be both costly and time-consuming to obtain.These approaches may also be affected by variations in the training data and the selection of model architecture and hyperparameters, which can impact their accuracy.
This study presents a new approach for segmenting individual trees from LiDAR data using convolutional graph autoencoder.The proposed method uses raw point cloud data without any additional information about forest structure or predefined markers, which can overcome the limitations of raster-based methods such as 3D information loss, understory misinterpretation, and errors caused by inaccurate tree locations.Unlike deep learning-based point-based methods, the proposed method does not require labeled data, which can overcome the need for data annotation.The proposed method can be broken down into several main steps.Firstly, a LiDAR point cloud is used to create a graph by connecting each node to its K nearest neighbors.Secondly, the generated graph is defined by its adjacency matrix and node features, and is then reconstructed by the autoencoder.During this stage, the autoencoder predicts the existence of an edge between any pair of unconnected nodes.Lastly, a single tree is identified by finding a set of connected nodes from the reconstructed graph.The main contributions of this research can be summarized as: • The development of a convolutional graph neural network that can extract single trees from LiDAR data.
• The ability to perform individual tree extraction without requiring any additional information about tree parameters or forest structure.
The rest of this paper is structured as follows: Section 2 describes the methodology used in the research.In Section 3, the study area, dataset, and experimental design are discussed.Section 4 presents the research findings and discussion.Finally, Section 5 provides a summary of the conclusions and future outlook of the research.

METHODOLOGY
An end-to-end graph neural network is proposed to perform tree segmentation from backpack laser scanning (BLS) data in four stages as shown in Figure 1.First, the data preparation and graph formation.Then, the graph is fed to a convolutional graph autoencoder which reconstructs the graph to make predictions.Finally, the reconstructed graph analysis to extract the single trees.

Data Preparation
In order to develop an end-to-end neural network, point cloud pre-processing has been integrated with the framework.The pre-processing includes outlier removal, ground-filtering, downsampling, and graph formation.In the subsequent subsections, the detailed steps of these processes are outlined.

Outlier Removal:
It is an important preprocessing step in LiDAR data processing.Outliers can be caused by a variety of factors, such as sensor noise, environmental conditions, and measurement errors, and can lead to errors in downstream processing tasks such as object detection and 3D reconstruction.As a result, it is necessary to remove these outliers from the point cloud before proceeding with any further processing.To achieve this, the distances between each point and its neighboring points within a certain radius r are calculated.The average distance µ d and standard deviation σ d of these distances are then computed.The maximum distance D max = µ d +n ×σ d is determined as the sum of µ d and a multiple of σ d , where the multiple n is a predefined constant.Points that have µ d > D max are considered outliers and are marked for removal from the point cloud.

Ground Filtering:
A point cloud PC ∈ R 3 was classified into the ground and non-ground points based on the cloth simulation filter (CSF) developed by Zhang et al. (2016).First, PC was inverted and covered with a rigid cloth.Then, the interactions between the nodes of the cloth and their counterparts from LiDAR points were analyzed.Accordingly, the positions of the cloth nodes were identified to form the ground surface.Finally, the formed surface was compared with the original LiDAR data to extract the ground points.The non-ground points were most important because they served as input for the various phases of this study.

Point Cloud Downsampling:
A 3D grid of a point cloud PC ∈ R 3 is generated by subdividing PC into small cubes with side lengths d x , d y , and d z in X , Y , and Z directions, respectively.A single cube (voxel) contains m points whose centroid is represented by Equation(1).Voxel downsampling was performed to only reduce the point density for computational reasons, and it had no effect on the point distribution or object shapes.It is worth noting that the voxelized point cloud PC = xi ∈ R f , i = 1, 2, 3, ..., N will be deployed in further steps of the proposed method. x

Graph Formation
A graph G is defined by a set of edges (links) E which connect a set of vertices (nodes) V .
Graph G is represented by its adjacency matrix A and the nodes feature matrix X .According to Equation (1), a voxelized point cloud PC is used to generate graph G with f = 3 denotes the XYZ-dimensionality or more when additional features (e.g., RGB or normal features) are present.To define the graph edges, the nearest neighbor search is employed to identify the K neighbors to every node based on 2D distances.Then, every node of G is connected to its K neighbors, therefore, the adjacency matrix A ∈ {1, 0} N × N is formed, where 1 denotes a positive (present) edge while 0 denotes a negative (absent) edge.Moreover, the nodes feature matrix X ∈ R N × f thus a single row in X involves the positional coordinates of a single node in G.

Model Architecture
The proposed graph neural network consists mainly of a convolution autoencoder.This autoencoder encompasses two parts: encoder and decoder.A convolution encoder is a set of convolution layers that can learn low-dimensional representations of the nodes in the graph.The autoencoder consists of an encoder that maps the adjacency matrix to a low-dimensional latent space, and a decoder that maps the latent space back to the adjacency matrix.A detailed explanation of the model architecture is presented in the next subsections.

Graph Autoencoder:
The principal question of a link prediction problem: if there are any two entities, should they be connected?The structural link prediction problem involves estimating the probability of the presence of a missing or absent connection between two nodes in a partially observed graph (Wang et al., 2014).The likelihood of an unobserved edge in a graph can be computed using various approaches, such as statistical models, machine learning algorithms, or graph-based methods.Graph autoencoders are a powerful tool for predicting missing edges in graphs because they can capture complex patterns and dependencies between nodes, and can handle graphs of arbitrary size and structure (Tran, 2018).
The structure of a graph G is defined by its adjacency matrix A and the features of its nodes X .The goal of the autoencoder model h(A, X ) is to learn a set of low-dimensional latent variables Z ∈ R N ×D that represent the nodes, such that the model can produce an output Â that closely approximates the original adjacency matrix A. By minimizing the error between A and Â, the autoencoder preserves the overall structure of the graph.
The autoencoder architecture involves transforming the adjacency vector a i of the i t h node in graph A using non-linear transformations.The autoencoder architecture consists of two parts: an encoder g (a i ) that maps a i from the original space N to a low-dimensional latent space R D , and a decoder f (z i ) that maps the latent variables z i back to the original space N .The resulting autoencoder model produces an approximate reconstruction output a i that closely approximates the original adjacency vector a i for the i t h node (Kipf and Welling, 2016).The computation of the hidden representations for the encoder and decoder parts is performed in the following manner: where H (l ) and W (l ) represent the output of a convolution layer l and the weight matrix, respectively.Given the lowdimensional representations of nodes i and j , denoted by z i and z j respectively, the probability of an edge between nodes i and j is given by: where σ is the sigmoid activation function, which maps the output to a probability value between 0 and 1.

Loss Function:
In link prediction for graphs, the goal is to predict the presence or absence of edges between pairs of nodes in a graph.Because link prediction of graphs is a binary classification problem, cross-entropy with logits is commonly utilized.In addition, logits provide a stable numerical representation of the model's output.Also, cross-entropy with logits can handle imbalanced datasets where the number of edges is much smaller than the number of possible edges.Moreover, it is a differentiable function, which means that it can be used to update the model parameters through backpropagation during training.This allows the GCN to learn the features that are most relevant for link prediction.The cross-entropy with logits loss function is defined as: where N e is the total number of edges, y i is the true label for the i t h edge (either 0 or 1), p i is the predicted probability of the i t h edge being present, and log represents the natural logarithm.The p i values are obtained from the output of the model before the sigmoid function is applied, also known as the logits.The sigmoid function is then applied to the logits to obtain the predicted probabilities p i = σ(z i ) = 1 1+e −z i .The loss function can be broken down into two terms: the first term penalizes the model for predicting a probability that is too small when the true label y i is 1, and the second term penalizes the model for predicting a probability that is too large when the true label y i is 0. The overall loss L is the average of the per-edge losses.The goal of training the model is to minimize this loss by optimizing the model parameters.By minimizing the loss, the model learns to predict the existence of edges in the graph more accurately.
The loss function you provided is a binary cross-entropy loss commonly used for training binary classification models such as link prediction models.In the context of a link prediction model, the encoder and decoder are responsible for learning representations of nodes and relationships between them, respectively.
During training, the encoder takes in a pair of nodes and produces their corresponding embeddings.The decoder then takes these embeddings and produces a probability score indicating the likelihood of a relationship existing between the two nodes.The loss function is used to measure the difference between the predicted probability score and the true label (i.e., whether a relationship actually exists between the two nodes).
The loss function is computed for each training example (i.e., each pair of nodes) and averaged over all examples to obtain a scalar value representing the overall loss for that iteration of training.This loss value is then used to update the parameters of the encoder and decoder using backpropagation.
By minimizing the loss function during training, the encoder and decoder are encouraged to learn representations of nodes and relationships that accurately predict whether a relationship exists between two nodes.In other words, the loss function serves as a measure of how well the model is able to perform the task of link prediction, and guides the learning process towards better performance.

Single Tree Segmentation
After the graph reconstruction using the graph autoencoder, a single tree is identified as a set of connected nodes in the reconstructed graph.Therefore, the individual trees are extracted from the reconstructed adjacency matrix Â of the graph G based on Algorithm 1.
The algorithm takes an adjacency matrix Â as input and returns a list of lists (i.e., trees), where each inner list (i.e., single tree) contains the indices of all nodes that are connected to each other.The algorithm starts by initializing an empty vi si t ed .add(node) cur r ent _t r ee.append(nod e) for nei g hbor ← 0 to l en( Â) − 1 do if Â(nod e, nei g hbor ) = 1 and nei g hbor ∉ vi si t ed then dfs(nei g hbor , cur r ent _t r ee) for nod e ← 0 to l en( Â) − 1 do if nod e ∉ vi si t ed then cur r ent _t r ee ← [] dfs(node, cur r ent _t r ee) E xt r act ed _t r ees.append(cur r ent _t r ee) return E xt r act ed _t r ees set for visited nodes and an empty list for individual trees.It then defines a recursive DFS function that visits all unvisited neighbors of a given node and adds them to the same tree.The visited set is used to keep track of which nodes have already been visited.The algorithm then iterates over all nodes in the graph, and for each unvisited node, it initializes a new tree and calls the DFS function to add all connected nodes to the same tree.The resulting tree is then added to the list of individual trees.Finally, the algorithm returns the list of individual trees.

Study Area and Dataset
The national forest park on the Yellow Sea coast in southeast China (120.82 • E, 32.87 • N) is the study area of this research.It is a managed forest with a total area of 2240 ha while the planted tree species involve dawn redwood (Metasequoia glyptostroboides), poplar (Populus deltoids), and ginkgo (Ginkgo biloba) (Fekry et al., 2022).Seven square test plots (30 m side length) were sensed using backpack laser scanning.The test sites comprise two tree species with different tree parameters including diameters at breast height (DBH), heights H , and stand densities SD as listed in Table 1.The GreenValley BLS system consists of a Velodyne Puck VLP-16 scanner, POS, and a touchpad to manage data capture was used for LiDAR data collection.The point cloud was produced using SLAM technology, which combines scan data with POS information.(5 − 10) × 10 3 Rotation rate (Hz) 5-20

Evaluation Scheme
To evaluate the performance of the tree segmentation, three metrics were employed.The first metric is called recall R, which is computed by dividing the number of correctly segmented trees by the total number of trees.The second metric is precision P , which is calculated by dividing the number of correctly segmented trees by the total number of segmented trees.The third metric is the F 1scor e, which provides an overall measure of accuracy based on both recall and precision.
The formula for computing these metrics is provided in Equation 5, where the true positive T P represents the number of correctly segmented trees, false negative F N represents the number of omitted trees, and false positive F P represents the number of incorrectly segmented trees.

R = T P T P + F N % P = T P T P + F P %
F 1scor e = 2 × P × R P + R % (5) It is worth noting that the number of trees in each plot is identified based on field observations which also provide tree locations.Therefore, the total number of segmented trees is compared to the reference number from fieldwork.If the reference tree location corresponds to a segmented tree, then the detected tree belongs to the T P .The F N occurs when the tree location does not represent a segmented tree (i.e., the omitted trees) while the F P represents the over-segmented and under-segmented trees.

Model Training
The ADAM optimizer Kingma (2015) was used for model training and the learning rate was 0.01.For nonlinearity, the rectified linear unit (ReLU) activation function was employed.The number of epochs was 250 with a step of 25, and the number of neighbors was set to 250 (i.e., K = 250 in KNN).All experiments were performed using an NVIDIA GeForce GTX 1080 GPU and the model was fully implemented based on the PyTorch library Fey and Lenssen (2019).Figure 2 shows the minimization of the reconstruction loss during training.

Single Tree Segmentation
Individual tree segmentation was performed on seven sample plots as shown in Table 1.The evaluation metrics were computed for single plots and the results were summarized in Table 3. Segmentation results were promising for all plots.De- tection accuracy ranged from 87.5 % to 92.5 % for the dawn redwood plots, while its counterpart from poplar plots was 84.6 % to 95.2 %. Figure 3 shows the segmentation results for the point cloud of a poplar sample plot based on the proposed approach.The segmentation results were analyzed by comparing the results of the individual plots as shown in Table 3. Considering the tree species, the precision range of the poplar species was 84.6 % to 95.2 % while this range was 87.5% to 92.5 % for dawn redwood species.On the one hand, the largest precision of poplar species occurred at Plot 1 (stand density: 256 tree/ha) while the precision recorded its minimum at Plot 6 with the lowest stand density.This could be attributed to the irregular tree shapes of Plot 6 where some trees are co-dominant.Although Plot 7 comprised the highest stand density among poplar plots, it reported a precision of 90% (5.4 % greater than the precision of Plot 6).The reason also could be the co-dominant trees in the case of Plot 6.On the other hand, sample Plot 5 had the highest stand for the dawn redwood plots and recorded the largest precision among the dawn redwood plots.This can be attributed to the simple understory structure with low shrubs of Plot 5.The precision gained at sample Plot 3 (87.5 %) was lower than that of Plot 5; however, the stand density of Plot 5 was greater than that of Plot 3.This also could be attributed to the complex understory structure of Plot 3 where there were a large number of irrigated grasses and shrubs.Similar findings were gained when comparing the precision of Plot 1 (poplar species) and Plot 3 (dawn redwood species).The precision of Plot 1 was more remarkable than that of Plot 3; however, Plot 3 was planted with poplar trees.Thus, the proposed approach has potentially performed for poplar and dawn redwood species that involve forest parameters.
It is worth noting that however the stems are well sampled from BLS data, the proposed methodology has segmented single trees from BLS data without stem detection.Moreover, our segmentation approach can be generalized to airborne  (e.g., ALS or UAV) LiDAR data where the stems are not wellsampled without treetop detection on a CHM.Thus, the loss of 3D information will be mitigated and the understory will be segmented.Unfortunately, the experiments have been conducted on planted sample plots with regular tree shapes.Therefore, we are willing to extend the experiments to other forest types with high complexity and mixed species.Further, performing tree segmentation from ground-based (e.g., TLS) and airborne LiDAR systems.

CONCLUSIONS
This research proposes a novel approach to perform single tree segmentation from LiDAR data.The developed approach is inspired by link prediction problems from recommendation systems.First, a point cloud is seen as an undirected graph.Then, this graph is fed to a graph convolution autoencoder to learn the node features for graph reconstruction.Finally, a single tree is extracted as the set of connected graph nodes based on link prediction.One main advantage of the proposed approach is that it does not require any information about trees or forest structures.In addition, it does not depend upon stem detection or treetop detection which mitigates the limitations of CHM-based approaches.The experiments involved seven forest plots from BLS data.The results showed a potential performance of the proposed approach in tree segmentation however, the experiments were conducted on a planted forest with monotype tree species at the plot level.Accordingly, one important aspect of future work is testing the proposed methodology against LiDAR data of natural forests (e.g., temperate, boreal) with complex plots.Also, extending the experiments to other types of LiDAR data including ground-based (e.g., TLS, MLS) and airborne (e.g., ALS, and UAV-LiDAR).

Algorithm 1 :
Individual tree extraction from the reconstructed graph.Input: Reconstructed adjacency matrix Â Output: List of individual trees function group c onnec t ed n od es() vi si t ed ← ; E xt r act ed _t r ees ← []; function dfs(node, cur r ent _t r ee):

Figure 3 .
Figure 3. Individual tree segmentation of two poplar plots based on the proposed approach.Unique colors represent single trees.

Table 1 .
Forest parameters of the test sites.
Table 2 lists the sensor parameters during the data acquisition.

Table 2 .
Scanner parameters used for data collection.

Table 3 .
Individual tree segmentation from BLS point cloud.