Human skeletal pose estimation can find several applications ranging from remote patient monitoring, pedestrian detection to defense security and surveillance. However, traditionally used high-resolution vision based sensors suffer operationally during poor illumination or object occlusion. Radars can overcome these challenges, albeit at the cost of a lower resolution. mmWave radars, on account of a higher bandwidth, have the ability to represent a target as a sparse point-cloud, with a higher resolution than its traditional radar counterparts. A supervised learning approach is adopted for skeletal estimation from the point-cloud, as its random nature from frame-to-frame makes explicit point-to-point association non-trivial. However, the lack of available radar data-sets make it extremely difficult to develop machine-learning aided methods to improve radar based computer vision applications. In this paper, we present a study to use simulated mmWave-radar-like point-cloud data to estimate skeletal key-points, of a human target using, a natural language processing approach. The sparsity and randomness in the radar point-cloud is simulated from a Microsoft Kinect acquired data using a random sampling approach. Two consecutive frames of the simulated radar point-cloud are first voxelized and aggregated, and a seq2seq architecture is used for "summarizing" it to the desired skeletal keypoints. Simulated data obtained by randomly sampling from a combination of (i) corrupting the 3D ground truth skeletal coordinates with Gaussian noise over a range of varying degrees of variance, and (ii) adding random point-cloud noise to the corrupted data, is used to evaluate the performance of the model. The comprehensive methodology, results and discussion is presented in this paper. The promising results from this proof-of-concept simulation study serve as a basis for future experimental study using mmWave radars which will also be made open-access for public research and development of radar based perception and computer-vision.