17  Download (0)

Full text





Senior lecturer, Dept of ISE,

BMS College of Engineering, Bangalore, India.



RNS Institute of Technology, Bangalore, India.


Senior Consultant, TCS ltd, Bangalore, India.


Human faces are non-rigid objects with a high degree of variability in size, shape, color, and texture. The face databases are extensively used for evaluation of various algorithms used in facial expression/gesture recognition systems. Any automated system for face and facial gesture recognition has immense potential in identification of criminals, surveillance and retrieval of missing children, office security, credit card verification, video document retrieval, telecommunication, high - definition television, medicine, human–computer interfaces, multimedia facial queries, and low-bandwidth transmission of facial data. This paper presents a comprehensive survey of the currently available databases that can be used in facial expression recognition systems. The growth in face database development has been tremendous during the recent years.

Keywords: Facial expression recognition systems, databases, human-computer interaction

I. Facial Expression Databases

Images used for facial expression recognition are static images or image sequences. An image sequence potentially contains more information than a still image, because the former also depicts the temporal information. The usage of these databases is restricted for research purpose only. Most commonly used databases include Cohn-Kanade facial expression database, Japanese Female Facial Expression (JAFFE) database, MMI database, CMU-PIE database. Currently 3D databases are used to a larger extent. Table 1 summarizes the facial expression databases that are currently available for evaluation usage.


AR database

3288 4/1/4 Color 768x567 Images 116 1998

Cohn Kanade AU-coded , v1

486 6/1/1 Gray 640x490 image


97 2000

CAS-PEAL database

30,900 6/21/9-15 Color 360x480 Images 1040 2003

Korean Face Database (KFDB)

52,000 5/7/16 Color 640x480 Images 1000 2003

MMI FE 800+ sequences, 200+ images

6/2/1 Color 720x576 Images & Image sequence 52 2005 University of Texas Video db

11/9/1 Color 720x480 Images,

video stream

284 2005

BU-3D FE 2500 7/2/4 Color 1040x1329 Images, 3D models

100 2006

FG-NET 399 6/1/1 Color 320x240 Image


18 2006 FE db of

MPI for Biological Cybernetics

5600 4/7/1 Color 256x256 Images 200 2006

BU-4D FE 606 3D sequence

6/ 1/1 Color 1040x1329 Image sequence, 3D models 101 2008 Radboud Face Database


I.I. FERET database

Face Recognition Technology program (FERET) was sponsored by the Department of Defence (DoD) Counterdrug Technology Development Program Office [Phillips et al. (2000)]. The goal of FERET program was to develop automatic face recognition capabilities that could be employed to assist security, intelligence and law enforcement personnel in the performance of their duties. The program consisted of three major elements: (a) sponsoring research, (b) collecting the FERET database, and (c) performing the FERET evaluations. The database was collected in 15 sessions between August 1993 and July 1996. The database contains 1564 sets of images for a total of 14,126 images that includes 1199 individuals and 365 duplicate sets of images. A duplicate set is a second set of images of a person already in the database and was usually taken on a different day. During each session 13 conditions with varying facial expressions, illumination and occlusion were captured.

Salient features:

 The age-related facial change was considered while collecting the images, with the interval between two sessions extending up to 2 years for some subjects.

 Duplicate sets of images considered.  Maximum number of subjects used. I.II. JAFFE database


Fig 2: Examples of images from the Japanese Female Facial Expression database.

Salient features:

 It is the only facial expression database that uses the minimum number of subjects.  Manual rating used to identify facial expressions.

I.III. AR database

The AR database was collected at the Computer Vision Centre in Barcelona, Spain in 1998. It contains images of 116 individuals (63 men and 53 women) [Martinez et al. (1998)]. The imaging and recording conditions (camera parameters, illumination setting, and camera distance) were carefully controlled and constantly recalibrated to ensure that settings are identical across subjects. The resulting RGB colour images are 768 × 576 pixels in size. The subjects were recorded twice at a 2–week interval. During each session 13 conditions with varying facial expressions, illumination and occlusion were captured. Figure 3 shows an example for each condition. So far, more than 200 research groups have accessed the database.

Courtesy: [Gross (2005)]

Fig 3: AR database. The conditions are (1) neutral, (2) smile, (3) anger, (4) scream, (5) left light on, (6) right light on, (7) both lights on, (8) sun glasses, (9) sun glasses/ left light, (10) sun glasses/ right light, (11) scarf, (12) scarf/ left light, (13) scarf/ right light.

Salient features:

 First ever facial expression database to consider occlusions in face images.  Inclusion of ‘scream’, a non-prototypic gesture, in the database.

 To enable testing and modelling using this database, 22 facial feature points are manually labelled on each face.

I.IV. Cohn-Kanade facial expression database


Panasonic WV3230 cameras, each connected to a Panasonic S-VHS AG-7500 video recorder with a Horita synchronized time-code generator. One of the cameras was located directly in front of the subject, and the other was positioned 30 degrees to the right of the subject [Kanade et al. (2000)]. Only image data from the frontal camera are available at this time. Subjects were instructed by an experimenter to perform a series of 23 facial displays that included single action units (e.g., AU 12, or lip corners pulled obliquely) and combinations of action units (e.g., AU 1+2, or inner and outer brows raised). Subjects began and ended each display from a neutral face, but the image sequences provided contains from neutral to expression (see figure 4). Before performing each display, an experimenter described and modelled the desired display. Six of the displays were based on descriptions of prototypic basic emotions (i.e., joy, surprise, anger, fear, disgust, and sadness).

Image sequences from neutral to target display were digitized into 640 by 480 or 490 pixel arrays with 8-bit precision for gray scale values. The images are available in png and jpg. Images are labelled using their corresponding VITC. The final frame of each image sequence was coded using FACS (Facial Action Coding System) which describes subject's expression in terms of action units (AUs). FACS codes were conducted by a certified FACS coder.

© Jeffrey Cohn

Fig 4: An image sequence of a subject expressing ‘Surprise' from the Cohn- Kanade Facial Expression Database.

Salient features:

 Image sequences considered instead of mug shots.  Evaluation performed based on Action Unit recognition. I.V. CAS-PEAL database

PEAL stands for Pose, Expression, Accessory and Lighting. A large-scale Chinese face database with variations in illumination and accessories is PEAL [Gao et al. (2004)]. The database currently contains 99,594 images of 1040 individuals (595 males and 445 females). Five different expressions, six accessories (3 glasses and 3 caps), and 15 lighting conditions are considered while capturing the images. 9 equally spaced cameras were used to horizontally capture different poses simultaneously (see figure 6). The subjects were asked to look up and down to capture another set of 18 images. The conditions considered during the database creation are listed in table 2. The currently available database for research contains a subset of this database with 30,900 images of 1040 subjects. These images belong to two main subsets: frontal and pose subset.

In frontal subset, all images are captured from camera C4 with the subject looking right into this camera. Among them, 377 subjects have images with 6 expressions (see figure 7), 438 subjects have images wearing 6 different accessories, 233 subjects have images under at least 9 lighting changes, 296 subjects have images against 2 or 4 different backgrounds and 296 subjects have images with different distances from cameras. Also, 66 subjects have images recorded in two sessions at a 6 month interval (see figure 8).


Fig 6: Plan form of CAS-PEAL camera system.

Table 2: Sources of variations considered in CAS-PEAL database


Viewpoints 9



directions Expression Lighting Accessory Background Aging Distance

3 6 15 6 4 2 2

#Combined 27 54 135 54 36 18 18

#Total 342


Fig 8: Example images captured with time difference. The images in the bottom row are captured after six months.

Salient features:

 Time/age consideration during image collection.  Inclusion of multiple accessories in database.

 Consideration of surprise and open mouth categories in the database. I.VI. Korean face database

The Korean Face Database (KFDB) contains facial imagery of a large number of Korean subjects collected under carefully controlled conditions [Hwang et al. (2003)]. The images with varying pose, illumination, and facial expressions were recorded. The subjects were imaged in the middle of an octagonal frame carrying seven cameras and eight lights (in two colours: fluorescent and incandescent) against a blue screen background. The cameras were placed between 45° off frontal in both directions at 15° increments. Figure 9 shows example images for all seven poses. Pose images were collected in three styles: natural (no glasses, no hair band to hold back hair from the forehead), hair band, and glasses. The lights were located in a full circle around the subject at 45° intervals (see figure 9). Separate frontal pose images were recorded with each light turned on individually for both the fluorescent and incandescent lights. Figure 10 shows example images for all eight illumination conditions. In addition, five images using the frontal fluorescent lights were obtained with the subjects wearing glasses. The subjects were also asked to display five facial expressions — neutral, happy, surprise, anger, and blink — which were recorded with two different colored lights (see figure 11), resulting in 10 images per subject. In total, 52 images were obtained per subject. The database also contains extensive ground truth information. The location of 26 feature points (if visible) is available for each face image.


Fig 10: Illumination variation in KFDB. Lights from eight different positions (L1 – L8) located in a full circle around the subject were used. For each position images with both fluorescent and incandescent lights were taken.

Fig 11: Example colour images of expression changes under two kinds of illumination.

Salient features:

 Usage of two types of illumination (fluorescent and incandescent lights)  Blinking, a non-prototypic gesture, considered in the database

I.VII. MMI FE database

The developers of MMI facial expression database are from the Man-Machine Interaction group of Delft University of Technology, Netherlands. This was the first web-based facial expression database [Pantic et al. (2005)]. The basic criteria defined for this database include easy accessibility, extensibility, manageability, user-friendliness, with online help files and various search criteria. The database contains both still as well as video streams depicting the six basic expressions: happiness, anger, sadness, disgust, fear and surprise. The activation of the individual facial action muscles is taken care of. The database was built using JavaScript, Macromedia Flash, MySQL, PHP and Apache HTTP server. JavaScript was used for the creation of dynamic pages, Macromedia Flash was used to build rich internet applications (animation features), MySQL was chosen for the database server, PHP due to its compatibility with MySQL and being an open source platform, Apache HTTP server for its open source application, security, extendibility and efficiency. This database provides the users with a very good repository, easily searchable. Over 200 images and 800 video sequences can be accessed by the user. There are a good 308 number of active users of this database currently.

Salient features:

 First web-based facial expression database.  Includes both still images and image sequences. I.VIII. University of Texas video database


mug shots provide nine discrete views ranging from left profile to right profile views in steps with equal degree. All the participants were asked to wear a grey-colored smock to cover the clothing from the camera. The video sequences captured were categorized into three varieties. The first one being the moving version of the static mug shots, the subjects were asked to move their heads and pause for a while at the required angles. The time duration from the first clip to the last one lasted about 10 seconds. The second variety is about dynamic facial speech videos capturing the rigid and the non-rigid movements of the subject while speaking. The subjects were asked to animate speech which included one or more head motions, facial expressions and eye gazes with speech movements. The subjects were asked to answer a series of mundane questions and their responses were recorded as the dynamic video sequence. The audio response was not considered and not recorded. The duration of each of the video sequence was 10 seconds. The third variety of video sequences comprised of facial expressions. The expressions captured were prototypic and non-prototypic viz., happiness, sadness, fear, disgust, anger, puzzlement, laughter, surprise, boredom or disbelief. The expressions have not been rated and have no ground truth provided by the user. There are instances where more than one expression was expressed by the subject.

The video sequences of people comprise of two variations (figure 13), the first is applicable to gait video and the second one is called a conversational video. In the gait video, the subjects walk parallel/perpendicular to the line of sight of the camera, approaching t.he camera, but veering off to the left at the end. The conversation video shows a conversation between two people, one facing the camera and the other in the opposite direction. Natural gestures were portrayed by the subject facing the camera as showing directions to various destinations in the building. The lighting effect was variable due to the light falling from outside the glass windows. The close-range videos provide test stimuli for face recognition and tracking algorithms that operate when the head is undergoing rigid and/or non-rigid transformations. The dynamic mug shots, speech, and expression videos are likewise useful for computer graphics modeling of heads and facial animation.

The memory required for the entire database is about 160 GB. This database is available for researchers only. The file format provided for images is TIFF and DV stream format for videos.

Fig 12: Row 1 shows a facial mug shot series with nine still images, varying in pose from left (- 90°) to right (90°) profile in 22.5° steps. The second row contains five still images extracted from a facial speech video. The third and fourth rows contain images extracted from a disgust


Fig 13: The first row of the figure contains five still images extracted from a parallel gait video. The second row contains five still images extracted from a perpendicular gait video. The third row of the figure contains five still images extracted from a conversation video.

Salient features:

 A combination of mug shots, image streams, videos (+ audios), conversational video and gait in database.  Can be used for a greater range of algorithm evaluation.

 Memory requirement is large. I.IX. BU-3D FE databases

The Binghamton University was materialistic in creating 3D facial expression databases for the purpose of evaluation of algorithms. The databases come in two versions, one with the static data and the other with dynamic data. The static database includes still color images, while the dynamic database contains video sequences of subjects with expressions. The databases also include the neutral expression with the six prototypic expressions. I.IX.I. BU-3D FE: 3D Dynamic facial expression database

3D facial models have been extensively used for 3D face recognition and 3D face animation, the usefulness of such data for 3D facial expression recognition is unknown [Yin et al. (2006)]. This 3D facial expression database (called BU-3DFE database) includes 100 subjects with 2500 facial expression models. The BU-3DFE database is available to the research community (e.g., areas of interest come from as diverse as affective computing, computer vision, human computer interaction, security, biomedicine, law-enforcement, and psychology).

The database presently contains 100 subjects (56% female, 44% male), ranging from 18 to 70 years of age, with a variety of ethnic/racial ancestries, including White, Black, East-Asian, Middle-east Asian, Indian, and Hispanic Latino. Participants in face scans include undergraduates, graduates and faculty from our institute’s departments of Psychology, Arts, and Engineering (Computer Science, Electrical Engineering, System Science, and Mechanical Engineering). The majority of participants were undergraduates from the Psychology Department (collaborator: Dr. Peter Gerhardstein).


(a) Four levels of facial expressions from low to high. Expression models show the cropped face region and the entire facial head

Fig 14: (a) The expression levels considered, (b) the seven expressions from BU-3D FE database.

Salient features:

 Introduction of 3D into facial expression database.  Inclusion of 3D models along with the databases.  Intensity levels for expressions considered.

I.IX.II. BU-4DFE (3D + time): 3D Dynamic Facial Expression Database

To analyze the facial behaviour from a static 3D space to a dynamic 3D space, the BU-3DFE was extended to the BU-4DFE. A newly created high-resolution 3D dynamic facial expression database is available to the scientific research community [Yin et al. (2008)]. The 3D facial expressions are captured at a video rate (25 frames per second). For each subject, there are six model sequences showing six prototypic facial expressions (anger, disgust, happiness, fear, sadness, and surprise), respectively. Each expression sequence contains about 100 frames (a sample


Fig 15: Individual model views from BU-4D FE database

Fig 16: Sample expression image and model sequences (male and female) from BU-4D FE database.

Salient features:

 Includes all features of BU-3D FE database with dynamic features.  The image sequences begin and end with a neutral expression. I.X. FG-NET database


to play a role. This includes that head movements in all directions. The covered emotions include Happiness, Disgust, Anger, Fear, Sadness, Surprise and Neutral (see figure 17 for a typical image sequence).

The images were acquired using a Sony XC-999P camera equipped with an 8mm COSMICAR 1:1.4 television lens. A BTTV 878 frame grabber card was used to grab the images with a size of 640 x 480 pixels, a color depth of 24 bits and a frame rate of 25 frames per second. Due to capacity reasons, the images where converted into 8 Bit JPEG-compressed images with a size of 320 x 240. The database can be downloaded as a collection of MPEG JPEG-compressed movies. After extraction the images are separately stored in subdirectories as follows: {anger, disgs, fears, happy, neutr, sadns, surpr}.

Fig 17: Image sequence of a subject from neutral state to emotion (happiness) state.

Salient feature:

 The expressions on the faces are considered as natural as possible. I.XI. FE database of MPI for Biological Cybernetics


Fig. 18 (a) Schematic overview of the Max Planck video lab, (b) Example stimuli from one actor showing the five different perspectives used in the current experiments.

Salient feature:

 Highly segmented/normalized images with the usage of black cap and scarf. I.XII. Radboud Faces Databases (RaFD)


(a) (b)


Fig. 19 (a) Eight emotional expressions from top left: sad, neutral, anger, contemptuous, disgust, surprise, fear and happiness, (b) Three gaze directions: left, straight and right, (c) Five camera angles at 180o, 135o, 90o, 45o and 0o.

Fig. 20 Targeted action units (AU) for all emotional expressions.

Salient features:

 Contempt, a non-prototypic expression considered.  Different gaze directions considered.


S no.

Name of the

database Contact for accessibility University / country


database http://face.nist.gov/colorferet/request.html

George Mason University, USA

2 JAFFE database

Michael J Lyons –


Psychology Department, Kyushu

University, Japan

3 AR database Aleix M Martinez – aleix@ecn.purdue.edu

Computer Vision Center, Purdue University, Barcelona, Spain 4 Cohn Kanade facial expression database

Jeffrey Cohn - jeffcohn@cs.cmu.edu

Carnegie Mellon University, Robotics

Institute, Pittsburg


database Shaoxin Li - sxli@jdl.ac.cn

Face Group, Chinese Academy of Sciences,


6 Korean Face

database http://www.kisa.or.kr/eng/main.jsp

Center for Artificial Vision Research, Korea University,



database Maja Pantic – M.pantic@ewi.delft.nl

Delft University of Technology, Delft, The Netherlands 8 University of Texas Video database

Alice O’Toole – otoole@utdallas.edu University of Texas, Dallas

9 BU-3D FE

database Lijun Yin - lijun@cs.binghamton.edu

Binghamton University, State University of New



database fgnet@mmk.ei.tum.de

Technical University of Munich, Munich

11 MPI for Biological Cybernetics database http://faces.kyb.tuebingen.mpg.de/index.php

Max Planck Institute for Biological

Cybernetics, Tubingen, Germany 12 Radboud Face

database Ron Dotsch - www.rafd.nl

Radboud University, Nijmegen, The



The authors are grateful to all the developers of the facial expression databases mentioned in the paper. Michael J Lyons, Aleix M Martinez, Jeffrey Cohn, Shaoxin Li, Maja Pantic, Alice O’Toole, Lijun Yin, Ron Dotsch and all of them who have contributed with their valuable suggestions and feedbacks. The authors are also grateful to the department of Electronics and Communications Engineering, BMS College of Engineering, Bangalore, for extending their support.


[1] Ekman P, Friesen W V & Hager J C (2002), Facial Action Coding System: Investigators guide, Salt Lake City, UT: Research Nexus. [2] Ekman P (2007), The directed facial action task, In Handbook of emotion elicitation and assessment, Oxford, UK: Oxford University Press. [3] Gao W, Cao B, Shan S, Zhou D, Zhang X, and Zhao D (2004), CAS-PEAL large-scale Chinese face database and evaluation protocols,

Technical Report JDL-TR-04-FR-001, Joint Research & Development Laboratory.

[4] Gross R (2005), Face Databases, Chapter in Handbook of Face Recognition, Springer-Verlag publications.

[5] Hwang B W, Byun H, Roh M C, and Lee S W (2003), Performance evaluation of face recognition algorithms on the Asian face database, KFDB. In Audio- and Video-Based Biometric Person Authentication (AVBPA).

[6] Kanade T, Cohn J, and Tian Y (2000), Comprehensive database for facial expression analysis, In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition.

[7] Langner O, Dotsch R, Bijlstra G, Wigboldus D H J, Hawk S T, and Van Knippenberg A (2010), Presentation and validation of the Radboud Faces Database, Cognition and Emotion, Psychology Press.

[8] Lyons M, Akamatsu S, Kamachi M, and Gyoba J (1998), Coding facial expressions with Gabor wavelets, In 3rd International Conference on Automatic Face and Gesture Recognition.

[9] Martinez A R and Benavente R (1998), The AR face database, Computer Vision Center (CVC) Technical Report, Barcelona.

[10] O’Toole A J, Harms J, Snow S L, Hurst D R, Pappas M R, Ayyad J H, Abdi H (2005), A video database of moving faces and people, IEEE transactions on Pattern Analysis and Machine Intelligence.

[11] Pantic M, Valstar M, Rademaker R and Maat L (2005), Web-based Database for Facial Expression Analysis, Proc. of IEEE Int’l Conf. Multmedia and Expo (ICME05).

[12] Phillips P J, Moon H, Rizvi S, and Rauss P J (2000), The FERET evaluation methodology for face-recognition algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10).

[13] Pilz K S, Thornton I M and Bülthoff H H (2006), A search advantage for faces learned in motion, Experimental Brain Research 171(4). [14] Wallhoff F (2006), Facial Expressions and Emotion Database, http://www.mmk.ei.tum.de/~waf/fgnet/feedtum.html, Technische Universität


[15] Yin L, Wei X, Sun Y, Wang J, Rosato M J (2006), A 3D Facial Expression Database For Facial Behavior Research, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).