The AVA Multi-View Dataset for Gait Recognition (AVAMVG)

Currently, there are many databases for Gait Recognition. However, most of them collect gait sequences from a single view. New challenges in the topic of gait recognition, such as achieving the independence from the camera point of view, require multi-view datasets so as to get a more robust recognition.

The most of multi-view current datasets were recorded under controlled conditions and, in some cases, they made use of a treadmill. As a consequence, most of the current databases are not representative of human gait in a real world. Besides that, there are not many multi-view datasets specifically designed for gait recognition. Some of them are specifically designed for action recognition, and therefore, they do not contain gait sequences of enough length as to contain several gait cycles, since gait is a subset of them.

Due to this, we have created a new multi-view database, containing gait sequences of 20 actors that depict ten different trajectories each. The database has been specifically designed in order to test Gait Recognition algorithms based on 3D data. Thus, the cameras have been calibrated and methods based on 3D reconstructions can use this dataset to test. The binary silhouettes of each video sequence are also provided.

Database description

Using the camera setup described above, twenty humans (4 females and 16 males), participated in ten recording sessions each. Consequently, the database contains 200 multi-view videos or 1200 (6 x 200) single view videos. In the following section we briefly describe the walking activity carried out by each actor of the database.

The walking activity

Ten gait sequences were designed before the recording sessions. All actor depict three straight walking sequences (t1,…,t3), and six curved gait sequences (t4,…,t9), as if they had to rounding a corner. The curved paths are composed by a first section in straight line, then a slight turn, and finally a final straight segment. In the last sequence actors describe a figure-eight path (t10).

Workspace setup for dataset recording, where {c1, …, c6} represent the set of cameras of the multiview dataset and {t1, …, t9} represent the different trajectories followed by each actor of the dataset.


Studio environment and camera setup

Six convergent IEEE-1394 FireFly MV FFMV-03M2C cameras are equipped in the studio where the dataset was recorded, spaced in a square of 5.8m of side at a height of 2.3m above the studio floor. The cameras have a wide 45º baseline to provide 360º coverage of a capture volume of 5m x 5m x 2.2m.

A natural ambient illumination is provided by four window, through which sun light and natural light enter into the scene. The underlying aim is to setup a real scenario, with changing light conditions and casted shadows. Video gait sequences were recorded at different times of day and the cameras were positioned above the capture volume and were directed downward without exclude lighting from the field-of-view.


Production studio.


Instead of using a screen backdrop of a specific color, as in other datasets, the background of the scene is the white wall of the studio. However, to facilitate foreground segmentation, the actors wear clothes of different color than the background scene.

Human gait is captured in 4:3 format with 640 x 480 at 25Hz. Synchronized videos from all six cameras were recorded uncompressed, direct to a dedicated PC capture box.

All cameras were calibrated to extract their intrinsics (focal length, centre of projection, radial distortion) and extrinsic (pose, orientation) parameters. To get the intrinsics of each camera, we used a classical black-white chessboard based technique (using the OpenCV library), while for the extrinsics we used the Aruco library, whose detection of boards (several markers arranged in a grid) have two main advantages. First, since there have more than one marker, it is less likely to lose them all at the same time. Second, the more markers detected, the more points available for computing the camera extrinsics. Aruco also uses OpenCV for calibration.

3D artifact with Aruco board of markers, used for getting the pose and orientation of each camera.


Camera calibration data are provided in ASCII text file, whose values are separated by space-delimiter. A file per camera is provided. This file defines the following camera parameters:


h w
fx fy
cx cy
k0 k1 k2 k3 k4 k5
r0 r1 r2
t0 t1 t2


where in the above table, h and w are the height and width of the image, respectively, F={fx, fy} are the focal lengths, C={cx, cy} are the centers of projection, K={k0, k1, k2, k3, k4, k5} are the distortion coefficients and R = {r0, r1, r2} and T={t0, t1, t2} are the camera rotation matrix (in Rodrigues notation) and translation vector respectively.

Multi-view video preprocessing

The captured sequences are released as a set of images (one per frame and camera) in png format and are archived in a directory tree as can be seen in the below figure. The first folder entry regards to the actor. Inside it, there is a set of organized folders with the following naming convection: “tr##cam##”, where “#” represent a number, “tr” is the trajectory number, and “cam” is the camera. E.g., “victor/tr06_cam05/” denotes the sixth trajectory, captured by camera number five and depicts the gait of the subject whose name is “victor”. Inside that directory, there are all the images corresponding to the sequence, in format “#####.png”.

Dataset content structure.

The raw video sequences were preprocessed to further increase the applicability of the database. To obtain the silhouettes of actors, we have used Horprasert’s algorithm. This algorithm is able to detect moving objects in a static background scene that contains shadows on color images.

Horprasert’s algorithm is simple and is able to deal with local and global perturbations such as illumination changes, casted shadows and lightening. We carry out a filtering through morphological operations as opening and closing after the background subtraction.

The opening operation is useful for removing small objects or noise. Besides, the closing operation is useful to fill small holes. We do not do any other postprocessing operation.

A brief preview

Example of our multiview dataset. People walking in different directions, from multiple points of view.

License Agreement

This database is free for research use only.

This agreement must be confirmed by a senior representative of your organisation.

To access and use this database, you agree to the following conditions:

  1. Multiple view video sets and associated data files will be used for research purposes only.
  2. The source of the database should be acknowledged in all publications in which it is used as by referencing this web-site (bibtex).
  3. The database should not be used for commercial purposes.
  4. The database should not be redistributed.

Your contact details will be used for notification of further data releases.

Further enquiries regarding access and use of this database should be directed to Dr. Francisco José Madrid-Cuevas. (fjmadrid[AT]uco[DOT]es).

Getting the database

In order to access the full database, please read the above license agreement and send an email to Dr. Francisco José Madrid-Cuevas, with the following information if you agree: Your name/affiliation, name/email of your supervisor (if you are a student).

How to cite AVAMVG

The source of the database should be acknowledged in all publications in which it is used as by referencing this web-site. To do this, we provide the following bibtex citation:

   author={{David L\'opez-Fern\'andez, Francisco J. Madrid-Cuevas, \'Angel Carmona-Poyato, Manuel J. Mar\'in-Jiménez and Rafael Mu\~noz-Salinas}},
   title={{The AVA Multi-View Dataset for Gait Recognition}},
   booktitle={Activity Monitoring by Multiple Distributed Sensing},
   series={Lecture Notes in Computer Science},
   editor={Mazzeo, Pier Luigi and Spagnolo, Paolo and Moeslund, Thomas B.},
   publisher={Springer International Publishing},