Facial Feature Detection and Facial Filters using Python

Harmesh Rana
6 min readMar 15, 2021

An Application for Detection of Facial Features on video using Deep Learning, Opencv, and Haar_Cascades by Harmesh Rana, Prateek Sharma, Vivek Kumar Shukla.

Detecting facial key points is a very challenging problem. Facial features vary greatly from one individual to another, and even for a single individual, there is a large amount of variation due to the 3D pose, size, position, viewing angle, and illumination conditions. Computer vision research has come a long way in addressing these difficulties, but there remain many opportunities for improvement.

Dataset Used: https://www.kaggle.com/c/facial-keypoints-detection provided by Dr. Yoshua Bengio of the University of Montreal.

Facial Recognition

Facial recognition scanning systems also use computer vision technology to identify individuals for security purposes. The most common example of computer vision in facial recognition is for securing smartphones. More advanced uses of facial recognition and biometrics include residential or business security systems that use unique physiological features of individuals to verify their identity. Deep learning algorithms can identify the unique patterns in a person’s fingerprints and use them to control access to high-security areas such as high-confidentiality workplaces, such as nuclear powerplants, research labs, and bank vaults.

Haar feature-based Facial Feature detection

Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine learning-based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.

Initially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. For this, Haar features shown in the below image are used. They are just like our convolutional kernel. Each feature is a single value obtained by subtracting the sum of pixels under the white rectangle from the sum of pixels under the black rectangle.

Source

For each feature calculation, we need to find the sum of the pixels under white and black rectangles. To solve this, they introduced the integral image. However large your image, it reduces the calculations for a given pixel to an operation involving just four pixels.

Source

For this, we apply each feature to all the training images. For each feature, it finds the best threshold which will classify the faces to positive and negative. Even 200 features provide detection with 95% accuracy. Their final setup had around 6000 features. So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is a face or not.

In an image, most of the image is a non-face region. So it is a better idea to have a simple method to check if a window is not a face region. If it is not, discard it in a single shot, and don’t process it again. For this, they introduced the concept of Cascade of Classifiers. Instead of applying all 6000 features on a window, the features are grouped into different stages of classifiers and applied one-by-one. The authors’ detector had 6000+ features with 38 stages with 1, 10, 25, 25, and 50 features in the first five stages.

Source

Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 key points, which represent the different elements of the face. The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels.

Installing Dependencies

Our project requires the following dependencies to be installed.

OpenCV by Python

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library.

pip install OpenCV-python

Let’s Build The Application

Step 1: Taking the Input Video

There are two ways to input a video:
1. Live Webcam Video.

rgb = cv2.VideoCapture(0)

2. A Prerecorded Video File.

rgb = cv2.VideoCapture("Input.mp4")

Step 2: Preprocessing of the Input Source

Now we need to preprocess the video file and convert it to a form more suitable for facial detection i.e. we need to extract frames from the video one by one as the model takes an image as its input. we also need to convert the frame to grayscale as the model works better on grayscale images.

_, fr = rgb.read()    
gray = cv2.cvtColor(fr, cv2.COLOR_BGR2GRAY)

Step 3: Face Detection

Before we detect facial features,we need to detect that part of image /frame which contains face beacuse,as discussed eariler,the haar cascade classifier applies hundreds of features to detect the position of facial features.To save time and processing power we only give that portion of image that contain the face. detectMultiScale() gives us x,y coordinates as well as width and height as w,h of the rectangular portion of the image that contains the face.

facec = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
faces = facec.detectMultiScale(gray, 1.3, 5)

Step 4: Feature Detection.

Now we pass the face to the model to detect the facial features and map all 15 detected features and their respective coordinates with suitable labels (for e.g [‘left_eye_center_x’, ‘left_eye_center_y’]).
pred_dict is the list of coordinates of the facial features predicted by the model.

pred, pred_dict = cnn.predict_points(roi[np.newaxis,:,:,np.newaxis]) 
pred, pred_dict = cnn.scale_prediction((x, fc.shape[1]+x), (y, fc.shape[0]+y))

Step 5: Applying Filter.

now we will pass the frame and the feature coodinates to apply_filter() method which will place the filter images on the appropriate position.

fr = apply_filter(fr, pred_dict)

Step 6: Writing the Output File

To write in a video file we recommend using the cv2 library. for this application, we used the WebM format with vp80 encoding which ultimately helps to run video files on webpages smoothly.

fps = int(video_capture.get(cv2.CAP_PROP_FPS)
width = int(video_capture.get(3))
height = int(video_capture.get(4))
fourcc = cv2.VideoWriter_fourcc(*'vp80')
PATH = '/Users/prate/static/demo.webm'
out = cv2.VideoWriter(PATH,fourcc, fps, (width,height))

we have to write the frames in the output video immediately after applying filter on them so that we get the serialized output.

for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
out.write(frame)

OUTPUT

OUTPUT FROM VIDEO FILE

Conclusion

This Application Focuses on the Prediction of the facial features of the face that are shown in the input in the form of video or live from webcam, this process is known as Face Feature Recognition. In this Application, we can easily apply various filters on the face using the coordinates of facial features predicted by the Haar Cascade.

The entire project code is available in the following Github repository:

Facial-Feature-Recognition.

Thank You.

Open For Reviews.

--

--