3D human models are key elements of modern visual effects VR/AR content creation, image and video editing and re-enactment, holoportation and virtual dressing. So far, these human models ideally show high-quality, space-time coherent and dense in terms of the shape, movement, deformation, and human appearance, irrespective of physique or clothing style, creating such models at high fidelity is very difficult. To perform full-body 3D reconstruction, currently available methods require the fusion of multiview images or multiple temporal images of the target. Using only a single image, available human parsing studies have covered popular topics starting from 2D pose detection, advancing to 3D pose detection, and finally expanding to body shape capture using a human statistic template such as SMPL. However, the statistic template can capture only the shape and pose of a minimally clothed body and lack the ability to represent a 3D human model under a normal clothing layer. Although the most recent work, BodyNet, has pioneered research towards this goal, it only generates nearly undressed body reconstruction results with occasionally broken body parts.