Modification and refinement of three‐dimensional reconstruction to estimate body volume from a simulated single‐camera image

Abstract Objective Body volumes (BV) are used for calculating body composition to perform obesity assessments. Conventional BV estimation techniques, such as underwater weighing, can be difficult to apply. Advanced machine learning techniques enable multiple obesity‐related body measurements to be obtained using a single‐camera image; however, the accuracy of BV calculated using these techniques is unknown. This study aims to adapt and evaluate a machine learning technique, synthetic training for real accurate pose and shape (STRAPS), to estimate BV. Methods The machine learning technique, STRAPS, was applied to generate three‐dimensional (3D) models from simulated two‐dimensional (2D) images; these 3D models were then scaled with body stature and BV were estimated using regression models corrected for body mass. A commercial 3D scan dataset with a wide range of participants (n = 4318) was used to compare reference and estimated BV data. Results The developed methods estimated BV with small relative standard errors of estimation (<7%) although performance varied when applied to different groups. The BV estimated for people with body mass index (BMI) < 30 kg/m2 (1.9% for males and 1.8% for females) were more accurate than for people with BMI ≥ 30 kg/m2 (6.9% for males and 2.4% for females). Conclusions The developed method can be used for females and males with BMI < 30 kg/m2 in BV estimation and could be used for obesity assessments at home or clinic settings.


| INTRODUCTION
Contactless reconstruction of individualized three-dimensional (3D) models for understanding body shape and size is an emerging technique in the medical field because of the discreet and rapid nature. 1 Conventional anthropometric measures (e.g., stature, segment length, girth measurements) and body volume (BV) can be estimated virtually from the reconstructed 3D models, to subsequently estimate the body fat quantity (e.g., body fat percentage) and distribution. [2][3][4][5] Conventional techniques such as manual measurement and underwater weighing to obtain these measures can be difficult to apply because of the requirement of technical expertise and a complicated test environment. Conventional anthropometric measures enable obesity-related measurements such as waist girth and waist-to-hips ratio to determine abdominal adiposity which is related to health risks such as cardiovascular disease, diabetes, cancer, and low back pain, and so forth. [6][7][8] can be used to calculate body density for estimation of body fat percentage, 9 which address some limitations of body mass index (BMI), to enable accurate obesity and health risk assessment. [10][11][12][13] Further, advanced shape parameters such as principal components can also be obtained to provide further information to improve the estimation of body composition 14,15 and somatotype. 16 Finally, body shape visualization with 3D human model reconstruction can improve people's motivation to control weight with diet and physical activity. [17][18][19] Recently, computer vision and machine learning techniques have been developed to estimate 3D body shape and pose from single or a small number of two-dimensional (2D) images, captured from general-purpose digital cameras. 5,20 Such techniques facilitate the portable and cost-effective reconstruction of 3D human models for anthropometric measurement and body shape visualization. 3D body shape, estimated using 2D images, can provide accurate length, breadth, and girth measurements. Smith et al. 21 showed that the errors of stature, waist and hip girth measurements obtained from 2D images are less than 2 cm; for context, this is better than that achieved by general practitioners. 22,23 Sengupta et al. 24 presented a machine learning technique that generates 3D human models from a single image, with low reconstruction errors (scale-corrected pervertex Euclidean error < 2 cm). Like traditional 3D scanning systems, multiple anthropometry measurements can be acquired from the 3D human models generated from a single or a small number of 2D images.
By applying these techniques, individualized body models can be generated without expensive, large, 3D body scanning systems. Only a consumer-level digital camera (e.g., smartphone, etc.) is required, which provides several advantages (e.g., the ubiquitous, portable and accessible nature of smartphones and digital cameras) when compared to conventional 3D scanning systems (which typically cost > $10k) and simplified systems such as Fit3D and Styku, which require specialized facilities. 5 However, the accuracy of BVs estimated from the 3D human models generated from single or a small number of 2D images is still unknown. Consequently, it is uncertain whether machine learning techniques can be used to estimate body fat percentage for obesity assessment.
A few commercial mobile applications enable the estimation of BV from 2D images. Sullivan et al. 25 recently showed the high accuracy of a commercially available application for BV acquisition from a single image (standard error of estimation < 1.0 L). However, the validation tests were conducted with limited participant groups. 25 The maximum BMI of participants was 30.9 kg/m 2 and thus no participants were in obesity class II or III (i.e., BMI ≥ 35 kg/m 226 ).
Furthermore, no individualized 3D human models can be generated using this technique, so body visualizations and other obesity-related measures (e.g., waist and hip girths), or other rich body shape parameters, 15,27 cannot be achieved by this approach.
Tian et al. 28 developed an approach that can estimate 3D body shape from images and predict body composition. The results showed it enables accurate estimation of body fat percentage comparing with dual-energy x-ray absorptiometry scans (root mean square error <3.5%). However, the estimation and prediction could be sensitive to the pose variant and silhouettes segmentations. 28 Further, the use of handles (to fix arm positions) and manual annotation that was applied in the previous study, 28 decreased the convenience of assessing obesity from a 2D image.
Among 3D body reconstruction from a single image, Sengupta et al. 24 presented an advanced machine learning technique, synthetic training for real accurate pose and shape (STRAPS), which can generate 3D human models from a single image with different poses.
The qualitative results of STRAPS demonstrate better 2D projection that matches silhouettes than the results shown in Tian et al. 28 Furthermore, the low reconstruction error of the STRAPS technique might lead to improved BV estimation compared with other machine learning techniques that are validated against conventional anthropometry. To the best of the authors' knowledge, the generation of 3D models and volumes using single camera images that allow pose variants (e.g., do not need a specific handles) including STRAPS and other machine learning techniques have not been validated using reference BV data. Therefore, the aim of this study was to apply and adapt the STRAPS technique to generate 3D human models for BV estimation.
Furthermore, an evaluation of BV estimates was performed using a large dataset comprising a wide range of participant body shapes.

| Datasets and participants
The commercial 3D dataset obtained from the world engineering anthropometry resource (WEAR) was used in this study. This dataset was collected in the Civilian American and European Surface Anthropometry Resource Project 29 and consists of 3D body scanning and manually measured anthropometric data from 4431 participants.
Data from participants without complete 3D scan data in a standing position (e.g., data missing, incomplete scanning data, or data captured with non-close-fitting clothes), or without manually 104measured anthropometric data (stature, body mass) were excluded.

| Reference body volume acquisition
The raw 3D scanning data in the WEAR dataset contains holes and noise as shown in Figure 1A. Automatic post-processing techniques adapted from the previous development 30 were applied to obtain watertight 3D models from the large datasets for BV acquisitions. The test-retest of whole BV obtained from the automatic post-processing techniques are less than 1 L. 30 The automatic technique enables accurate BV compared with manual post-processing 30 the acquired BVs (error < 1 L) were thus used as reference values for validation. First, the screened Poisson reconstruction techniques 31 were applied to fix the incomplete scans such as the tops of heads, hands, and feet as shown in Figure 1B. Then, the reconstructed meshes were clustered to remove random points from the main meshes. The floor plane was setup and the hole-filling algorithm 32 in the open source Python module, "pymeshfix" (https://pypi.org/project/pymeshfix/), was applied to reconstruct mesh on foot region and ensure the body mesh was watertight for reference BV acquisitions as shown in Figure 1C.

| Body image rendering
The color images are required for machine learning techniques to estimate joint positions and silhouettes as input data of the STRAPS techniques. As no color image data were collected in the WEAR dataset, the 2D images were generated with the 3D scanning data.
The vertex colors of the watertight meshes were determined by the closest point color on the raw 3D scanning data as shown in Figure 1D. The colored watertight meshes were then used for generating body images as shown in Figure 2A

| Individualized 3D model generation
After generating the body image, the joint locations and silhouettes were predicted by machine learning techniques, Keypoint-RCNN 34 and PointRend, 35 respectively as shown in Figure 2C

| Evaluation tests
The estimated BVs (BV straps , BV scale , and BV regression ) of the test datasets were compared with the reference BVs. The formula presented by Siri et al. 9  scanning protocol of the WEAR dataset, the BVs were corrected by subtracting the tidal, expiratory reserve, and residual volumes. 36 Standard error of the estimation (SEE) and Bland and Altman limits of agreement were conducted to determine the accuracy of the proposed methods.

| Statement on ethics approval
The ethics approval of this study was given by the University Ethics Committee.

| RESULTS
Participants exhibited a wide BMI range from 15.2 to 57.1 kg/m 2 .
Around 16% of participants had obesity (BMI ≥ 30 kg/m 2 ) and 55% of participants were of normal weight (BMI < 25 kg/m 2 ). The evaluation test results are shown in Table 2 After applying the regression model, the SEEs and standard deviations of errors were less than 3 L (relative SEEs < 3%) for BV across all the groups apart from the men with obesity (relative SEE = 6.9%).
The bias (absolute mean errors) and the standard deviation errors of BV (−1.1 � 7.6; limit of agreement = (−15.9, 13.7)) and body fat percentage (−3.5 � 24.8; limit of agreement = (−52.1, 45.0)) for male participants with obesity were higher than the ones for other groups. BV estimates for males and females with obesity had higher SEEs (>2 L) and standard deviation errors (>2 L) than the ones for the group without obesity (<2 L). Using stature can improve the accuracy of BV estimation to reduce the limits of agreement (standard deviation errors decrease to less than 10 L). This improvement by involving stature confirms that using single 2D images contains some scale ambiguities. 5 However, BV estimated with the correction with stature for males and females with obesity contains higher SEEs and limits of agreement than BVs for the group without obesity. The body shapes variation of the group with obesity might be more complicated than for the nonobesity group so it is hard to predict from a 2D image with current models. Further model improvement might be required to acquire correct BV values for specific groups such as participants with obesity. Using body mass in a regression model can further reduce the SEEs of BV estimation which shows that using a few manual anthropometric measurements enhances the reconstruction of individual human models from 2D images. It is highly recommended that further development that estimates 3D body shape from a 2D image should combine with a few manual anthropometric data which could be measured precisely with minimal training such as stature and body mass to improve the model accuracy.

| DISCUSSION
The BV obtained from the STRAPS technique with a regression model enabled accurate estimation for people without obesity (SEE ≤ 1.5; relative SEE < 2%). The SEE was a little higher than for commercial mobile apps (SEE < 1.0). 25 Comparing the results of the groups with obesity and without obesity, accuracy decreases as BMI increases ( Table 2). The proportion of participants in overweight and obesity and the BMI range in evaluation tests for this study (45%; 15.2-57.1 kg/m 2 ) is higher than the ones in the previous study (31%; 18.6-30.9 kg/m 2 ). 25 Most participants in the previous study were in the normal-weight group 25  Evaluation tests were conducted with the synthesized data generated with the datasets established for more than 20 years.
Similar techniques are widely used in computer vision for developing machine learning models. 21,24 Although the BV obtained from 3D scanning data enables reference data acquisition, some assumptions and errors still exist in 2D image rendering, mesh post-processing, and residual lung volume estimation. Some random errors caused by camera optics or hardware were ignored. To the authors' knowledge, there are no public datasets that include 2D images and reference BV data, obtained directly obtained using medical methods such as air displacement plethysmography and dual-energy x-ray absorptiometry. This is the first study that has applied and adapted methods to generate 3D models using single images that allows pose variants, to compare to validated BV data across a large dataset. The evaluation tests in this study demonstrate a protocol and a benchmark for future computer vision and machine learning research to improve BV estimation and body visualization from single 2D images. Further studies can use the protocols presented here to evaluate the developed methods with synthesized data before collecting data directly.
This provides an alternative solution for research institutes without access to devices for medical measurements.

| CONCLUSION
This study adapted computer vision techniques to generate individualized 3D models for the estimation of BV; a wide range of body morphologies was used to assess the validity of BV estimates. The proposed approach generated body visualizations and accurate BV estimates could be made using a consumer-level digital camera for females and males without obesity (BMI < 30 kg/m 2 ). The proposed approach does not require expensive 3D scanning equipment, and as such, could be used in home or clinic settings for obesity management and monitoring. Further research should be conducted to improve estimation for individuals with obesity, eliminate gender differences, and validate using directly collected data.

This study was supported by SYLVAL LTD and a Catalyst Grant from
Sheffield Hallam University.