SHPED: The Stereo Human Pose Estimation Dataset

Manuel I. López-Quintero, Manuel J. Marín-Jiménez, Rafael Muñoz-Salinas, Francisco J. Madrid-Cuevas, Rafael Medina-Carnicer


We provide a dataset of stereo image pairs suited for stereo human pose estimation of upper-body people. SHPED consists of 630 stereo image pairs (i.e. 1260 images) classified into 42 video clips of 15 frames each. The clips have been extracted from 26 stereo videos, obtained from YouTube with the tag yt3d:enable = true.


In addition, SHPED contains 1470 stickman upper-body annotations corresponding to 49 persons according these conditions: up-right position, all upper-body parts almost visible, and non-profile viewpoint of the body. Furthermore, we include a plane projective transformation in every clip for rectifying and detections (bounding boxes) of each person along the sequence. The stereo image pairs are in a wide range of variations in appearance, clothing, human pose, illumination, image quality, baseline separation of the cameras, and/or background.

Stickmen annotations

We manually set stick and keypoint annotations for the following upper-body parts: torso (stick in yellow), upper-left arm (green), upper-right arm (red), lower-left arm (blue), lower-right arm (cyan), and head (grey); and for the following joints: shoulders, elbows, and wrists. These annotations are ready for evaluation purposes attending standards such as PCP (Percentage of Correctly estimated body Parts), PCK (Percentage of Correct Keypoints), APK (Average Precision of Keypoints), etc.


Filename Description Size Stereo pair images, annotations and plane projective transformations 197,8 MB
references_10_02_2015.txt Youtube video IDs 312bytes

Version - Change log

  • 1.0.1: Added 49 annotation files in the rectified folder ready for new standard evaluation measures based on keypoint localization error.
  • 1.0.0: Initial release.


If you use SHPED: The Stereo Human Pose Estimation Dataset, please cite our paper:

Manuel I. López-Quintero, Manuel J. Marín-Jiménez, Rafael Muñoz-Salinas, Francisco J. Madrid-Cuevas, Rafael Medina-Carnicer,
Stereo Pictorial Structure for 2D Articulated Human Pose Estimation,
Machine Vision and Applications, vol. 27, no. 2, pp. 157–174, 2015.

[PDF]   [DOI]   [BibTeX]


This work was partially supported by the Research Projects TIN2012-32952 and BROCA, both financed by the Spanish Ministry of Science and Technology and the European Regional Development Fund (FEDER).


These images have been extracted from videos hosted in YouTube and belong to their copyright holders. If you use them, you must respect the corresponding terms of use.