RAPID AND AUTOMATED BODY MEASUREMENT OF CATTLE BASED ON STATISTICAL SHAPE MODEL

: The current methods of non-contact livestock body measurement directly deal with the low-quality point cloud data of livestock, which have low robustness and lack practicality. On the one hand, the success rate of keypoint detection for livestock body measurement is low. Due to the severe occlusion and noise in the point cloud data, body measurements of some data cannot be performed. On the other hand, the key frames need to be manually selected from the point cloud sequence during processing. Inspired by the work of 3D reconstruction based on animal statistical shape models, we implement the construction and learning of the statistical shape model of real cattle. Given the establishment of the statistical shape model of cattle, a 3D reconstruction and body measurement approach of real cattle based on low-quality point cloud data is proposed. Nine indicators are calculated and the overall estimation MAPE (Mean Absolute Percentage Error) is 10.27%. The whole process of the body measurement algorithm proposed in our paper can be extended to other quadrupeds.


INTRODUCTION
Accurate monitoring of the livestock body is vital for farmers and breeders to comprehend the growth status, production, reproduction and breeding of livestock.Manual measurement cannot meet the growing demand for intelligence (Shuai et al., 2020, Bartol et al., 2021).The automated growth monitoring of livestock is of great significance for the sustainable development of animal husbandry.
The detection, tracking and analysis of animals have diverse applications in biology, neuroscience, ecology, agriculture and recreation.Although widely used, the field of computer vision focuses more on modeling the human body, such as estimating human pose and analyzing human behavior.However, it is not directly feasible to extend or apply this work to animals.The main reason is that compared with humans, animals are obviously not as cooperative as humans.Inappropriate interventions can have a dramatic impact on animal well-being, so there are far fewer 3D scan datasets for animals than for humans such as (Anguelov et al., 2005, Weiss et al., 2011, Bogo et al., 2014).Therefore, existing research on the shape and pose of animals lags behind that on the human body by a large margin.

Methods based on multi-view stereo vision
In recent years, scholars have carried out a lot of research on the application of stereo vision in the field of animals.(Wu et al., 2004) used six high-resolution cameras to obtain images of the top, side, and rear views of each pig, and developed a stereo imaging system that reconstructed the 3D shape of live pigs.(Pezzuolo et al., 2018) proposed a photogrammetry method based on Structure from Motion (SfM) and applied it to the 3D modeling and measurement of the pig body.The above animal reconstruction methods based on multi-view stereo technology require conditions such as illumination and camera synchronization, and the poses of animals are limited.
1.2 Methods based on depth camera (Kongsro, 2014) used the Kinect camera to collect depth map images of pigs and estimated the weight of the pig from images.(Wang et al., 2018) proposed a portable automatic measurement system for pig body size, in which two Xtion depth cameras were utilized to capture point cloud data from two viewpoints, and the body measurement was realized by segmentation and pose normalization.(Ruchay et al., 2020) designed a vision system consisting of three Kinect v2 cameras to acquire cattle data for automatic body measurement.The above approaches have high operating efficiency, but also have high demands on animal cooperation and multi-camera synchronization, and the reconstruction accuracy is not high.
1.3 Methods based on 3D template (Cashman and Fitzgibbon, 2012) predefined a dolphin template to learn a low-dimensional model of its deformation by manually extracting keypoints and manually segmenting.The model was optimized to minimize reprojection errors of keypoints and silhouettes.The method works for dolphins, but there are limitations in fitting non-template objects.(Vicente and Agapito, 2013) obtained the template of the corresponding object from the reference image and used the deformation of the template to fit the input image.The resolution of the reconstruction result obtained by this method is low.(Kanazawa et al., 2016) learned separate animal models of cats and horses and presented a volumetric deformation framework to deform 3D templates through user interaction.Template-based reconstruction techniques do not explicitly model joints, and the result of template deformation is still a rough shape, resulting in poor reconstruction.The majority of the three aforementioned methods solely concentrate on the shape features of outcomes, and fail to ensure the accuracy and reasonableness of topological results.

Methods based on statistical shape models
Currently, it is a research hotspot to use deep learning and other technologies to regress parameters of animal parametric models from RGB images or videos to realize 3D reconstruction of animal body surfaces.Inspired by the parametric model of the human body, (Zuffi et al., 2017) proposed a Skinned Multi-Animal Linear model (SMAL) to capture variations in shapes and poses among various quadruped toy figurines.Based on SMAL, researchers have sequentially proposed species-specific parametric models.(Li et al., 2021) defined the horse model hS-MAL and applied it to video-based lameness detection.(Biggs et al., 2019) proposed a system to recover 3D models of various quadrupeds from videos.(Zuffi et al., 2019) proposed an endto-end model SMALST that integrated the SMAL model into a regression network to reconstruct 3D animal shapes with texture information from a single image of a field scene.(Biggs et al., 2020) generated a new parametric model SMBLD including limb scaling.The 3D reconstruction of the animal body surface is generally developing in a data-driven direction.The SMAL model is used as the deformation template in this paper.
In this work, we take full advantage of statistical shape models to obtain topologically consistent 3D surface datasets.Our contribution is to propose a method for parametric reconstruction of cattle from low-quality point cloud data.Our process is generalizable and can be broadened to encompass other quadrupeds.

Overview
Fig. 1 shows a pipeline of the proposed method.A series of preprocessing is first performed on the original data.Based on the prior model, the template mesh is fitted to the scan data using constraints such as keypoints.A pose normalization method is proposed, and all fitted meshes are unified into a standard pose.On the topologically consistent 3D surface dataset, parametric models of animal shape and pose are constructed and learned.Finally, based on low-quality observation data, animal parametric reconstruction and body parameter measurement are carried out.The dataset, provided by (Ruchay et al., 2020), was obtained on a rectilinear passway through an automated data acquisition system.Each data is mainly composed of standing cattle and background environment.Three laptops respectively control three depth cameras and are on the same network.Because the time on the laptop is synchronized, the minimum time interval among three camera devices is selected to generate the dataset to obtain the best matching result for the point cloud data.The dataset also includes transformation matrices among cameras for registering data obtained from three viewpoints into a unified coordinate system.The cattle are constantly moving freely during the acquisition process, so there are differences in the data poses of the cattle.Since a series of obstacles such as sensor quality, animal movement and on-site environment, missing data, outliers and noise are unavoidable.

Experimental dataset and preprocessing
To obtain reliable point cloud data, background subtraction, outlier removal, normal vector estimation and multi-view registration are performed.Fig. 3 shows the data preprocessing result of a low-quality point cloud of a cattle.
Figure 3.The result of data preprocessing of a cattle.

Topologically consistent 3D surface dataset
The template mesh is fitted to the point cloud data and processed by pose normalization.The meshes of different poses are normalized to the canonical pose.Then the topologically consistent surface dataset is output.To prevent being trapped in a local optimum while fitting, the sum of squared errors of S k and T k is minimized as eq(1) (Horn et al., 1988): where a is the scale factor, R is the rotation matrix and b is the translation vector.The transformed data is S  All vertices {p f i | i = 1, ..., Nv; f = 1, ..., F } of topologically consistent meshes of F different cattle with the same pose are input.Nv is the number of vertices on each individual.{p f i } is performed row stacking to get P ∈ R F ×Nv .The shape space P is normalized by orientation and decentralized, and the average shape T and the covariance matrix D of the normalized shape space are computed.D is subjected to eigenvalue decomposition.According to the size of the eigenvalues, the eigenvectors are arranged in descending order.The eigenvectors corresponding to the first 23 eigenvalues are taken to form the shape base V = {V1, ..., V |θ| }.Linear blending is performed to approximate the shapes of different individuals, as eq(2): where f (θ) is the shape mapping function.Fig. 6 shows the average shape T of the statistical shape model of the real cattle.Different shapes can be obtained by adjusting the value of the shape parameter θ.
(4) where R j m and t j m are rotation and translation matrices of the m-th bone in the j-th pose, respectively.The statistical shape model of the real cattle is T = F(β, θ, γ).

Parametric reconstruction
The process of 3D reconstruction is transformed into the problem of minimizing the loss function.The animal shape and pose prior (Biggs et al., 2020) are used for regularization.Let µ β and C β denote the mean and covariance matrices of the pose prior.The constraint Epose is expressed by the Mahalanobis distance as eq(5): the shape prior constraint E shape is similar to the pose prior as eq(6): where µ θ and C θ are the mean and covariance of the shape prior, respectively.The local joint rotation constraint Erotate reduces the rotation of joints on the X-axis and Z-axis to meet the needs of cattle's motion poses.The keypoint constraint E keypoint minimizes the sum of distances between keypoints to optimize the shape and pose of the template mesh T as eq(7): where N k is the number of keypoints, s is the scale factor and v k i (β, θ, γ) is the keypoint of T under pose β, shape θ, and translation γ.s ′ k i and v k i (β, θ, γ) are corresponding points.The data constraint E data is expressed as the sum of the distances of all corresponding points of T and S ′ and measures how close T is to S ′ as eq(8): where Nc is the number of corresponding points, and s . The loss function EP (β, θ, γ, s) can be expressed as eq(9): (9) where wpose, w shape , wrotate, w keypoint and w data are five weights.With the goal of minimizing EP (β, θ, γ, s), the gradient descent algorithm Adam (Kingma and Ba, 2014) is used to optimize β, θ, γ and s, and the best fitting mesh for low-quality observation data is is shown in Fig. 7.

Body parameter estimation
Nine body parameters chest width (CW), ilium width (IW), hip joint width (HJW), oblique body length (OBL), hip length (HL), withers height (WH), hip height (HH), heart girth (HG) and chest depth (CD) are evaluated.Evaluation indicators mean absolute error (MAE) and mean absolute percentage error (MAPE) are selected as eq(10): where ŷi is the estimated value, yi is the real value, and NE is the number of experimental data.

RESULTS AND DISCUSSION
Table1 and Table2 show the results without pose normalization and pose normalization, respectively.The results after pose normalization show some improvement, with an overall MAPE of 10.27%.The overall MAPE without pose normalization is 10.93%.Judging from the measurement results of body parameters after pose normalization, the MAPE of CW, HH, HG, and CD is all less than 10%, indicating that this method has a high accuracy of measuring these body parameters.The MAPE of IW, HJW, OBL, HL, and WH is greater than 10%.There are two reasons why pose normalization improves a little over no normalization.One is that the training samples of the SMAL model are animal toys, and the shape of the cattle toy data is different from that of the real cattle.The second is the lack of some body shapes in the training data of the parametric model.These all lead to poor local fitting results when the template mesh is fitted to the cattle observation data, whether normalized or not.Therefore, in the future, it is necessary to use a richer dataset to train the parametric model, which will enhance the generalization ability of the model and ultimately improve the measurement accuracy of body parameters.Fig. 8 shows our results in comparison to those of (Du et al., 2022).The overall MAPE of (Du et al., 2022) is 12.82%, where CW, HG, HJW, HL and IW are less accurate than ours.The main reason is that the cattle are scanned while they are in motion, causing the width keypoints to deviate from the same cross-section of the body.Especially when the data is severely missing, it is difficult to get accurate results.In contrast, our method is more robust.But the MAPE of CD, HH, OBL and WH is still high, the root cause is the lack of corresponding body shapes in the training data.
Existing methods for estimating body parameters directly on low-quality point clouds are less robust and lack practicality.
Due to the high failure rate of keypoint detection, severely occluded areas and noisy points, the body measurement of some data is limited and cannot be executed.Our proposed body parameter estimation method successfully tackles the challenge of measurement when data quality is low.Moreover, pose normalization can address the difficulty of inconsistent body measurements when animals move freely.Experiments show that our approach has high accuracy and robustness.Each reconstruction takes about 2 minutes, which is much faster than our previous work (Luo et al., 2023).

CONCLUSION
Due to the insufficient cooperation of animals and the self-occlusion of quadrupeds, it is difficult to obtain high-precision animal 3D

Figure 1 .
Figure 1.The pipeline of our approach.

Figure 2 .
Figure 2. RGB-D images and point cloud data of cattle from three different views.As shown in Fig.2, the original dataset is captured synchronously by three Microsoft Kinect v2 cameras and consists of RGB-D images and point clouds from left, right, and top views of 103 cattle, as well as manually measured body references.The dataset, provided by(Ruchay et al., 2020), was obtained on a rectilinear passway through an automated data acquisition system.Each data is mainly composed of standing cattle and background environment.Three laptops respectively control three depth cameras and are on the same network.Because the time on the laptop is synchronized, the minimum time interval among three camera devices is selected to generate the dataset to obtain the best matching result for the point cloud data.The dataset also includes transformation matrices among cameras for registering data obtained from three viewpoints into a unified coordinate system.The cattle are constantly moving freely during the acquisition process, so there are differences in the data poses of the cattle.Since a series of obstacles such as sensor quality, animal movement and on-site environment, missing data, outliers and noise are unavoidable.
2.3.1 Parametric modelAs shown in Fig.4, the SMAL model T = M(β, θ, γ)(Zuffi et al., 2017) is selected for its generalization, where T is a 3D mesh with 3889 vertices and 7774 faces, β, θ and γ are pose, shape and translation parameters, respectively.The pose parameter β is a 33 × 3 dimensional vector representing the rotations of the 33 joints in the parametric model.The shape parameter θ that determines the body shape characteristics of the model is a 41-dimensional vector, which is obtained by PCA on the 3D animal toy dataset.The displacement in three orthogonal directions relative to the initial position is controlled by the translation parameter γ.

Figure 4 .
Figure 4.An illustration of the SMAL model.
detection and coarse alignment In model fitting, high-quality corresponding points are essential for registration.DeepLabCut (Mathis et al., 2018)  is used for 2D keypoint detection.2D is mapped to 3D according to the intrinsic and extrinsic parameters of the camera.Then keypoints S k = {s k i } 27 i=1 on scan data and T k = {t k i } 27 i=1 on template mesh are obtained.Considering that keypoints on the template T only need to be predefined once, the corresponding points T k are marked manually.
normalization and dataset augmentation To reduce the impact of the animal's free movement on fitted meshes, β = 0 is set to realize normalization.Blender is used to build the skeleton, the data of 190 different individuals in the same pose and 99 different poses of the same individual are obtained as Fig.5.(a) Different individuals in the same pose.(b) Different poses of the same individual.

Figure 6 .
Figure 6.The average shape of the statistical shape model of the real cattle.Where wim is the weight of the m-th bone to the i-th vertex, pi is the coordinate of the i-th vertex in the standard pose, Rm and tm are the rotation and translation matrices of the m-th bone, respectively.SSDR (Smooth Skinning Decomposition with Rigid Bones) (Le and Deng, 2012) is used to solve wim.Given J target poses (i.e., datasets of different poses of the same individual), meshes T in a resting state (including N vertices) and |β| bone joint points.Vertices of target poses are {v j i | i = 1, .., N ; j = 1, ..., J }.The algorithm is as eq(3): min

Figure 7 .
Figure 7.An illustration of the reconstruction effect.

Figure 8 .
Figure 8. Comparisons with the state-of-the-art method on the same dataset.scandatasets.Currently, the lack of robustness and practicality in measuring animal bodies on low-quality point cloud data is a prevalent issue.Aiming at the above problems, based on the statistical shape model, this paper combines the pose normalization method to obtain a topologically consistent 3D surface dataset.The real cattle parametric model is constructed and learned, and the geometric difference of the cattle body is decomposed into two parts: shape and pose.Based on the statistical shape model of cattle, we propose a method for 3D reconstruction and body measurement of real cattle based on low-quality point clouds.The overall pipeline of our proposed body parameter estimation method for cattle can be extended to other quadrupeds.

Table 1 .
MAE and MAPE resultsfor nine body parameters without pose normalization.

Table 2 .
MAE and MAPE results for nine body parameters with pose normalization.