4 Two-dimensional Face Recognition
4.1 Feature Localization
Before discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment。 This process typically consists of two stages: face detection and eye localization. Depending on the application, if the position of the face within the image is known beforehand (for a cooperative subject in a door access system for example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localization here, with a brief discussion of face detection in the literature review 。
The eye localization method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented are representative of the face recognition accuracy and not a product of the performance of the eye localization routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation。
We detect the position of the eyes within an image using a simple template based method。 A training set of manually pre-aligned images of faces is taken, and each image cropped to an area around both eyes。 The average image is calculated and used as a template。
Figure 4—1 The average eyes。 Used as a template for eye detection.
Both eyes are included in a single template, rather than individually searching for each eye in turn, as the characteristic symmetry of the eyes either side of the nose, provide a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background。 Although this method is highly susceptible to scale (i。e. subject distance from the camera) and also introduces the assumption that eyes in the image appear near horizontal。 Some preliminary experimentation also reveals that it is advantageous to include the area of skin just beneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in the eye—sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin).
A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes。 Applying the same procedure using a smaller template of the individual left and right eyes then refines each eye position.
This basic template-based method of eye localization, although providing fairly precise localizations, often fails to locate the eyes completely。 However, we are able to improve performance by including a weighting scheme。
Eye localization is performed on the set of training images, which is then separated into two sets: those in which eye detection was successful; and those in which eye detection failed。 Taking the set of successful localizations we compute the average distance from the eye template (Figure 4—2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as we would expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly from the average eye template.
Figure 4-2 – Distance to the eye template for successful detections (top) indicating variance due to noise
and failed detections (bottom) showing credible variance due to miss-detected features.
In the lower image (Figure 4-2 bottom), we have taken the set of failed localizations(images of the forehead, nose, cheeks, background etc。 falsely detected by the localization routine) and once again computed the average distance from the eye template。 The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasize the difference of the pupil regions for these failed matches and minimize the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detection rate。
Figure 4—3 - Eye template weights used to give higher priority to those pixels that best represent the
eyes。
2
4.2 The Direct Correlation Approach
We begin our investigation into face recognition with perhaps the simplest approach, known as the direct correlation method (also referred to as template matching by Brunelli and Poggio) involving the direct comparison of pixel intensity values taken from facial images. We use the term ‘Direct Correlation’ to encompass all techniques in which face images are compared directly, without any form of image space analysis, weighting schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson’s correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as our metric in these investigations (inversely related to Pearson’s correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections.
Firstly, all facial images must be aligned such that the eye centers are located at two specified pixel coordinates and the image cropped to remove any background information。 These images are stored as grayscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing the corresponding pixel intensity value)。 Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as the query image q, and gallery image g), we get an indication of similarity. A threshold is then applied to make the final verification decision。
4.2.1 Verification Tests
The primary concern in any face recognition system is its ability to correctly verify a claimed identity or determine a person’s most likely identity from a set of potential matches in a database。 In order to assess a given system’s ability to perform these tasks, a variety of evaluation methodologies have arisen。 Some of these analysis methods simulate a specific mode of operation (i.e。 secure site access or surveillance), while others provide a more mathematical description of data distribution in some classification space。 In addition, the results generated from each analysis method may be presented in a variety of formats。 Throughout the experimentations in this thesis, we primarily use the
3
verification test as our method of analysis and comparison, although we also use Fisher’s Linear Discriminate to analyze individual subspace components in section 7 and the identification test for the final evaluations described in section 8。 The verification test measures a system’s ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented for comparison, for which the system must return either an acceptance (the two images are of the same person) or rejection (the two images are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored image from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rejection decision。
The results of the test are calculated according to how many times the accept/reject decision is made correctly。 In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (for example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test。 Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set.
However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely the same circumstances as in the application environment. On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are to be overcome。 This may mean including a greater percentage of ‘difficult’ images than would be expected in the perceived operating conditions and hence higher error rates in the results produced。 Below we provide the algorithm for executing the verification test. The algorithm is applied to a single test set of face images, using a single function call to the face recognition algorithm: Compare Faces (FaceA, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar the two face images。 Ideally, images of the same face should produce low scores, while
4
images of different faces should produce high scores。
Every image is compared with every other image, no image is compared with itself and no pair is compared more than once (we assume that the relationship is symmetrical)。 Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practical tests this information is often encapsulated as part of the image filename (by means of a unique person identifier)。 Scores are then stored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The final acceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acceptances.
These two error rates express the inadequacies of the system when operating at a specific threshold value。 Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by altering the threshold value) will inevitably result in increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces an additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.
Figure 4—5 — Example Error Rate Curve produced by the verification test.
The equal error rate (EER) can be seen as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognition performance of a biometric system and allows for easy visual comparison of multiple
5
methods。 However, it is important to note that the EER does not indicate the level of error that would be expected in a real world application。 It is unlikely that any real system would use a threshold value such that the percentage of false acceptances was equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials。
Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment。 Therefore we should bear in mind that a system with a lower EER might not necessarily be the better performer towards the extremes of its operating capability。
There is a strong connection between the above graph and the receiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualizations of the same results, in that the ROC format uses the True Acceptance Rate (TAR), where TAR = 1。0 – FRR in place of the FRR, effectively flipping the graph vertically。 Another visualization of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentation format provides a reference to determine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves intersect.
Figure 4—6 — Example error rate curve as a function of the score threshold
The fluctuation of these error curves due to noise and other errors is dependant on the number of face image comparisons made to generate the data. A small dataset that only allows for a small number of comparisons will results in a jagged curve, in which large steps correspond to the influence of a single image on a high proportion of the comparisons made。 A typical dataset of 720 images (as used in section 4.2。2) provides 258,840 verification operations, hence a drop of 1% EER represents an additional 2588 correct
6
decisions, whereas the quality of a single image could cause the EER to fluctuate by up to
4 二维人脸识别
4。1 特征定位
在讨论两幅人脸图像的比较之前,我们先简单看下面部图像特征定位的初始过程.这一过程通常有由两个阶段组成:人脸检测和眼睛定位.根据不同的应用,如果在面部图像是事先所知的(例如在门禁系统主题之中),因为所感知区域是已知的,那么人脸检测阶段通常是可以跳过的.因此,我们讨论眼睛定位的过程中,有一个人脸检测文献的简短讨论。眼睛定位适用于对齐的各种测试二维人脸图像的方法通篇使用于这一节。但是,为了确保所有的结果都代表面部识别准确率,而不是对产品功能的眼睛定位,所有图像结果都是手动检查的。若有错误,则需要更正测试和评价。我们发现在一个使用图像的眼睛一个简单的基于模板的位置方法。
在一个区域中对前脸手动对齐图像进行采取和裁剪,以两只眼睛周围的区域,平均计算图像作为模板。
图4—1 — 平均眼睛,用作模板的眼睛检测
两个眼睛都包括在一个模板,而不是单独的为单个搜索,因为眼睛在鼻子两边对称的特点,这样就提供了一个可用方法,可以帮助区分眼睛和其他可能误报的背景。虽然这种方法介绍了假设眼睛水平的形象出现后很容易受到小距离的影响(即主体和相机的距离),但初步试验显示,还是利于包括眼睛下方的皮肤区域得到校准去的结果。因为在某些情况下,眉毛可以密切配合模板,特别是如果在眼睛区域的阴影周围.此外眼睛以下的皮肤面积有助于区分眉毛(眉毛下方的面积眼中包含的眼睛,而该地区眼睛下面的皮肤只含有纯色)。窗口区域是通过对测试图像和绝对差采取的这一平均眼睛上面显示的图像。图像的最低差额面积作为含有眼中感知的区域。运用同样的程序使用小模板单人左,右眼,然后提取每只眼睛的位置.
这个基本模板的眼睛定位方法,尽管提供了相当精确的本地化,但往往不能找到完全的眼睛区域.但是,我们能够改善性能和加权值。
眼睛定位是在执行图像处理,然后被分成集两套:哪些眼睛检测成功的,和哪些眼睛检测失败的。以成功的本地化处理,我们在计算平均距离眼睛模板(图4—2丁部)时,请注意,该图像是非常黑暗的,这表明发现眼睛密切相关的眼睛模板,正如我们期望的那样.然而,亮点确实发生在眼睛区域,表明这方面经常是不一致的,不同于普通模板。
7
图4—2 对眼睛模板成功检测(左),由于方差噪音和失败的检测(右)显示
在右侧的图像(图4—2右),前额,鼻子图像,脸颊,背景等采用了虚假的检测,并再次从眼睛计算了平均距离。明亮点由暗区包围表明,一个失败的匹配往往和鼻子和颧骨地区绝大多数的高相关性差相关。我们排除以上价值较低的图像产生的重矢量,如图4-3所示。应用到差分图像在总结前的误差,这个比重计划大大提高了检出率。
图 4—3
4.2直接相关方法
我们把最简单的人脸识别调查方法称为直接相关方法(也称为模板匹配的布鲁内利和波焦),其中所涉及的像素亮度值直接比较取自面部图像。我们使用术语'直接关系’,以涵盖所有图像技术所面临的直接比较,以及没有任何形式的形象空间分析,加权计划或特征提取。因此,我们并不能推断皮尔逊函数的相关性,作为应用相似的功能(尽管这种做法显然会受到我们的直接相关的定义)。我们通常使用欧氏距离度量作为我们的调查结果(负相关,Pearson相关,可以考虑作为一个规模和翻译的图像相关敏感的形式),这还对比了后面的章节的空间和子空间图像方法。
首先,所有的面部图像必须保持一致,这样使眼睛在两个中心位于指定的像素坐标和裁剪,以消除任何背景中的图像信息。这些图像存储为65和82像素灰度位图前进入了5330元素(每个元素包含向量转换确认相应的像素强度值)。每一个对应的向量可以认为是在说明5330点的三维图像空间。这个简单的原则很容易被推广到更大的照片:由256像素的图像256占用一个在65,536维图像空间,并再次指出,类似的图像占据接近点在该空间。同样,类似的面孔靠近一起在图像空间,而不同的面间距相距甚远。计算欧几里得距离d,两个人脸图像向量(通常称为查询图像Q和画廊图像克),我们得到一个相似的迹象。然后用一个阈值,制作出最后核查结果。
4。2.1验证测试
任何一个人脸识别系统的主要关注点是它能够从一个潜在的集合数据库中正确地验证人脸的身份或确定一个人最可能的身份.为了评估一个给定的系统的能力是否能执行这些任务,我们可以采用不同的评价方法。其中一些分析方法模拟一个具体的运作模式(即安全网站的访问或监视),而其他人提供更多的数据分布(数学描述中的
8
一些分类空间)。此外,每个分析结果产生的方法可能提交的各种格式。 在本论文的整个实验过程中,我们主要验证我们的方法分析和比较,以核查措施的测试系统的能力,是否能正确地接受或拒绝一个人的身份认证。在一个功能级别中,可以减少到两个图像的比较,该系统必须对任何一个接受返回(两个图像是同一人)或拒绝(两个不同的图像人)做出结论。该测试旨在模拟安全网站访问的应用领域。在这种情况下,主题将在某一入境点的身份证件,或是刷卡,接近芯片或PIN号码。用于检索数据库中的已知对象通常被称为目标(1存储的图像画廊或图像),并在入境点(捕获的现场图像比较查询图像).能否通过访问是根据当时获得的接受/拒绝的决定而执行的.
测试结果计算出多少人的接受/拒绝决定是正确的。为了执行这项测试,我们必须首先确定我们的测试人脸图像集。虽然这些图像的测试集数量不会影响结果产生的准确性,但重要的是要确保测试集是足够大,这样才能使统计的异常变得不重要(例如,一个非常一致的匹配以及情侣的图像)。另外,影像的类型(照明高度变化,部分遮挡等)将明显改变的结果测试.因此,为了比较多个面部识别系统,这些图像必须适用于相同的测试集。还应该指出,如果系统性能的结果代表在现实世界中的情况,测试数据应在同样情况下所获得的.另一方面,如果该实验的目的是评估和完善人脸识别方法,是否可应用到产品所在的范围环境中,那么测试过程中困难,要尽量克服。这也可能意味着包含一个’难'的图片,产生较高的错误率的结果。
以下我们提供了执行验证测试的算法。该算法适用于单个测试人脸图像集,调用一个在人脸识别算法中的函数:CompareFaces(FaceA,FaceB)。这一函数是用来比较两个面部图像。返回距离的评分表明两个不同的人脸图像的相似度:得分越低越相似.理想情况下,相同的人脸图像生产低分数,而不同的人脸图像产生高分。每一个形象,与所有其他形象相比较,,并与自身比较不止一次(我们假设关系是对称的).当两个图像进行比较,产生相似性评分,结果用于确定图像中是否为同一人或不同人。实际中这些信息往往是通过测试图片中部分独特的人标识符来确定结果。比较后存储在两个列表其中一个中:一份列出通过比较的不同人的形象清单,另一份列出通过分数比较产生的同一人图像清单。最终的接受/拒绝决定是由一个门槛决定。任何不正确的决定,记为虚假或错误拒绝接受。该错误拒绝率(FRR)的计算方法作为得分从被认为是在拒绝归类相同的百分比.该错误接受率(FAR)是按不同的分数比例被认为是在接受归类的人。
这两个错误率反映了系统的不足之处。理想情况下,这两个数字应该是零,但在现实中无论是远或近(通过改变阈值)将不可避免地导致在增加。因此,为了描述一个完整的工作范围尤其是在系统中的,我们通过不同的分数范围的阈值来产生数据.每个阈值应用程序产生一个额外的容积率,它绘制在图表上时产生的错误率曲线如下图所示。
9
图4—5 — 范例错误率曲线的验证测试生成
等错误率(能效比)可以被看作是点的远近所产生的值。这能效比值通常被用来作为一个单一的代表普遍承认的数字生物识别系统的性能和视觉比较容易允许多个方法.不过,重要的是要注意,能效比未注明级别错误,这将是在一个真实世界中的应用预期。这是不太可能有真正的 系统将使用一个阈值,这样的虚假承兑百分比等于拒绝虚假的百分比。安全网站接入系统通常会设置的门槛,例如虚假承兑汇票均显着高于假的则拒绝:不容忍入侵者在访问不便否认成本.另一方面监控系统将要求低错误拒绝率成功地确定一个受控环境中的人少。因此,我们应该承担。记住,一个具有较低的能效比制度不一定是最好的表演实现其经营能力的极端。有一图形和接收器强大的连接操作特征(ROC)曲线,亦在此类实验中使用。这两个图是完全相同的结果,在这视觉效果,中国格式使用真验收率,其中有效地翻转图垂直。另一个验证试验结果的可视化的,而且同时显示了FRR和职能的阈值.此演示文稿格式提供一参照确定阈值要达到一个特定的FRR,该能效比可以被看作是点的两条曲线相交。
图4-6 — 范例错误作为得分率阈值函数曲线
这些错误的波动曲线,噪音和其他错误是由于人脸图像进行比较时产生的,便可生成数据。一个小的数据集包含一个比较小的数目和一条锯齿状曲线。720图像的一个典型的数据集所提供258840验证操作,因此下降1%的EER代表了一个额外的2588
10
的正确决策,而一个单一的图像质量可能会导致能效比波动达
11
因篇幅问题不能全部显示,请点此查看更多更全内容