Don’t _miss

Stay Tuned

<We_can_help/>

What are you looking for?

>Blog >A 1-hour sprint to Computer Vision with OpenCV in Python

A 1-hour sprint to Computer Vision with OpenCV in Python

This is a guest post written by Prof. Dr. Vladimir Matic, Strahinja Zivkovic and Strahinja Stefanovic from dataHacker.rs blog.

What is OpenCV?

OpenCV is the most popular library that is used for image processing in Python. It is used to detect and recognize faces, identify objects, classify human actions in videos, track movement with camera and many others. The goal of this post is to sprint you through the fundamentals of digital image processing using OpenCV in Python. So, let’s start our journey.

What is a pixel and image?

The definition of an image is very simple: it is a two-dimensional view of a 3D world. Furthermore, a digital image is a numeric representation of a 2D image as a finite set of digital values. We call these values pixels and they collectively represent an image. Basically, a pixel is the smallest unit of a digital image (if we zoom in a picture, we can detect them as miniature rectangles close to each other) that can be displayed on a computer screen. For a more detailed explanation check out our post on how to access and edit pixel values in OpenCV.

A digital image is represented in your computer as a matrix of pixels. Each pixel is stored as an integer number. If we are dealing with a grayscale image, we are using values from 0 (black pixels) up to 255 (white pixels). Any number in between these two is a shade of gray. On the other hand, color images are represented with three matrices. Each of those matrices represent one primary color which is also called a channel. The most common color model is the Red, Green, Blue (RGB). These three colors are mixed together to generate a broad range of colors. Note that OpenCV loads the color images in reverse order so that the blue channel is the first one, the green channel is the second, and the red channel is the third (BGR). We can see the order of the channels in the following diagram:

This image has an empty alt attribute; its file name is Picture14-1024x446.jpg

Due to this, we might have somewhat of a problem because other Python packages (e.g. matplotlib) use the RGB color format. This is why it is very important to know how to convert an image from one format into another. Here is one way how it can be done in a fancy way.

# Necessary imports
import cv2
import numpy as np
import matplotlib.pyplot as plt
# For Google Colab we use the cv2_imshow() function
from google.colab.patches import cv2_imshow

# We load the image using the cv2.imread() function
# Function loads the image in BGR order
img = cv2.imread("Benedict.jpg",1)
cv2_imshow(img)

Output:

This image has an empty alt attribute; its file name is download-23.png

However, when we plot our image with matplotlib due to the fact that it uses the RGB color order, colors in our displayed image will be reversed.

plt.imshow(img)

Output:

This image has an empty alt attribute; its file name is download-24.png

Now, we have two images. The first one is our original image. In addition, we have also created the second one (img1) where we split the original image into 3 channels and then merge them back together into the RGB order. Next, we are going to plot both images, first with OpenCV and then with matplotlib. 

# We can split the our image into 3 three channels 
(b, g, r) b, g, r = cv2.split(img)
# Next, we merge the channels in order to build a new image
img1 = cv2.merge([r, g, b])

OpenCV:

Output:

This image has an empty alt attribute; its file name is download-25.png

Matplotlib:

Output:

This image has an empty alt attribute; its file name is download-26.png

So, now, for img1, matplotlib works properly, but for the OpenCV we got reversed colors. Luckily, it is easy for us visually to inspect whether the colors are displayed correctly.

Video processing

First, let’s see what a video is. It is actually a sequence of images which gives the appearance of motion. Videos can be seen as a collection of images (frames). For more information check out the following link.

In a video processing we can also change and modify its colors. In the following example we will split every color frame into three color channels. They are stored as matrices with the same height and the width as our original video. Then, by changing and modifying its values we can create a fun visual effect.

# Creating the VideoCapture object 
cap = cv2.VideoCapture("Video.mp4") 
ret, frame=cap.read()

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')
out = cv2.VideoWriter('output2.mp4', fourcc, 10, (640,360))
# Splitting our channels
(b, g, r)=cv2.split(frame)
for i in range(100):
 # In every iteration increase a blue channel pixel value for 1. 
  b = b+1  
  frame = cv2.merge([b, g ,r ] )
  out.write(frame)
out.release()

Output:

How to write text on the image?

Writing text on images is absolutely essential for image processing applications and projects. So let’s learn how to do this.

First, we need several parameters. Second, we need to define our font type. Once we do that, we are going to use the function cv2.putText(). After adding some arguments like the starting point ( provided text string stars at the upper-left corner), size, color and the line type of our text, we are ready to go. Last provided parameter is line type. There are three available line types in OpenCV (cv2.LINE_4, cv2.LINE_8, cv2.LINE_AA).

Now, let’s have a lite fun and try to recreate our Datahacker.rs logo

# Creating our image
img = np.zeros((400, 400, 3), dtype="uint8")
img[:] = (255,255,255)

# Creating "Datahacker" logo
cv2.rectangle(img, (76,76), (324,300), (0,0,0), (2))
font=cv2.FONT_ITALIC
font2=cv2.FONT_HERSHEY_PLAIN
cv2.putText((img),"H A C K E R", (100,350), font, (1), (0,0,0), 2, cv2.LINE_AA)
cv2.putText((img),"D", (120,162), font2, (5), (0,0,0), 5, cv2.LINE_AA)
cv2.putText((img),"T", (120,264), font2, (5), (0,0,0), 5, cv2.LINE_AA)
cv2.putText((img),"A", (240,164), font2, (5), (0,0,0), 5, cv2.LINE_AA)
cv2.putText((img),"A", (240,264), font2, (5), (0,0,0), 5, cv2.LINE_AA)
cv2_imshow(img)

If we want to create a GIF out of this logo, we will use the following code:

# Creating the images
img = np.zeros((400, 400, 3), dtype="uint8")
img[:] = (255,255,255)
plt.imshow(img)
cv2.rectangle(img, (76,76), (324,300), (0,0,0), (2))
font=cv2.FONT_ITALIC
font2=cv2.FONT_HERSHEY_PLAIN

cv2.putText((img),"D", (120,162), font2, (5), (0,0,0), 5, cv2.LINE_AA)
plt.axis("off")
# Plotting the image that we are created
plt.imshow(img)
# Saving an image that we are created 
# Dots per inch (dpi) is resolution that we chose
plt.savefig("d_001.jpg", dpi = 800)

cv2.putText((img),"A", (240,164), font2, (5), (0,0,0), 5, cv2.LINE_AA)
plt.axis("off")
plt.imshow(img)
plt.savefig("d_002.jpg", dpi = 800)
cv2.putText((img),"T", (120,264), font2, (5), (0,0,0), 5, cv2.LINE_AA)
plt.axis("off")
plt.imshow(img)
plt.savefig("d_003.jpg", dpi = 800)

cv2.putText((img),"A", (240,264), font2, (5), (0,0,0), 5, cv2.LINE_AA)
plt.axis("off")
plt.imshow(img)
plt.savefig("d_004.jpg", dpi = 800)

cv2.putText((img),"H A C ", (100,350), font, (1), (0,0,0), 2, cv2.LINE_AA)
plt.axis("off")
plt.imshow(img)
plt.savefig("d_005.jpg", dpi = 800)


cv2.putText((img),"H A C K E R", (100,350), font, (1), (0,0,0), 2, cv2.LINE_AA)
plt.axis("off")
plt.imshow(img)

The code for creating GIF animation you can find on the blog Datahacker.rs in the post How to draw lines, rectangles, circles and write text on images with OpenCV in Python?      

This image has an empty alt attribute; its file name is ezgif-4-efb7c6b64d9b.gif

How to flip an image with OpenCV?

So, we learned how to apply some basic operations in OpenCV, and now it is time to dive deeper to our first image processing technique. These techniques are fundamental tools for almost all areas of computer vision. In OpenCV we can use transformations like image resizing, image translation, image rotation and image flipping.

Let’s explore one of them. This is a transformation called flipping. This means that we are going to flip our image around the x or y. It is very simple. We just need to call cv2.flip() function and we need to provide only one argument. This argument is a value that will determine around which axes we will flip our image. A value 1 indicates that we are going to flip our image around the y-axis (horizontal flipping). On the other hand, a value 0 indicates that we are going to flip the image around the x-axis (vertical flipping). If we want to flip the image around both axes, we will use a negative value (e.g. -1).

A matrix that we are using to perform this operation in Linear Algebra is called a reflection matrix. For flipping operations in Python this matrix is not required, but it is good to know how it looks like.

This image has an empty alt attribute; its file name is Picture3-1024x966.jpg

# Fliping the image around y-axis 
flipped = cv2.flip(img, 1) cv2_imshow(flipped)
# Fliping the image around x-axis 
flipped = cv2.flip(img, 0) cv2_imshow(flipped)
# Fliping the image around both axes 
flipped = cv2.flip(img, -1) cv2_imshow(flipped)

Output:

This image has an empty alt attribute; its file name is Picture4-1024x971.jpg

How to make sepia, emboss filter or other Instagram filters?

When we want to blur or sharpen our image, we need to apply a linear filter. There are  several types of filters that we often use in image processing. In this post we will show how to create very interesting filter effects that are widely popular on social media networks such as Instagram.

Sepia effect

Processes for creating a sepia effect includes the following steps. Red, green and blue channels are multiplied with certain coefficients and they create new, updated values for the red, green and blue. Then, we split our channels. After we processed them, we obtained new channels for red, green and blue. Once this is finished, we create our final output image by merging channels in new red, new green and new blue. Finally, we plot our image and we can see the sepia effect that we have created. It resembles an old brown pigment that was widely used by photographers back in the early days of photography.

# Splitting the channels
(b,g,r)=cv2.split(img1)
r_new = r*0.393 + g*0.769 + b*0.189
g_new = r*0.349 + g*0.686 + b*0.168
b_new = r*0.272 + g*0.534 + b*0.131
img_new=cv2.merge([b_new, g_new, r_new])
cv2_imshow(img_new)

Output:

This image has an empty alt attribute; its file name is download27.png

Emboss effect

Here we will illustrate another interesting filtering method called emboss. Notice that here, after filtering our image, we will get the difference image that is very low in pixel intensity values. This means that the output image will be rather black. Therefore, we will add a constant of 128 and we will obtain a resulting image in more grayish colors. This example demonstrates how we can create an emboss effect.

# Splitting the channels
(b,g,r)=cv2.split(img1)
r_new = r*0.393 + g*0.769 + b*0.189
g_new = r*0.349 + g*0.686 + b*0.168
b_new = r*0.272 + g*0.534 + b*0.131
img_new=cv2.merge([b_new, g_new, r_new])
cv2_imshow(img_new)

Output:

This image has an empty alt attribute; its file name is download17.png

Morphological transformations

When we want to remove the noise in images we can use morphological transformations. There are two basic morphological transformations and they are called dilation and erosion. They are used in image processing within various applications. In addition, once we learn these two basic morphological operations we can combine them to create additional operations like opening and closing.

Dilation is exactly what it sounds like. It is an addition (expansion) of bright pixels of the object in a given image. So, how can we dilate or expand the image? We just need to perform the convolution on our input image with the kernel. The kernel is defined with respect to the anchor point which is usually placed at the central pixel.

In the Figure below, we have our input image (left image) and the kernel (middle image). We are going to take our kernel and run it across the whole image in order to calculate a local maximum for each position of the kernel. This local maximum we will store in the output image.

This image has an empty alt attribute; its file name is ezgif-4-fe2217d0df33.gif

So, let’s see how we can implement this in OpenCV. As the first step, we will load our input image and we need to threshold it in order to create a binary image. In this post you can find a detailed explanation of a thresholding. Also if you like the illustrations in this post check out the book called “The hundred-page Computer Vision OpenCV book”. This book will help you to master Computer Vision very quickly.

 “The hundred-page Computer Vision OpenCV book

# Loading an input image and performing thresholding
img = cv2.imread('Billiards balls 1.jpg', cv2.IMREAD_GRAYSCALE)
_, mask=cv2.threshold(img, 230, 255, cv2.THRESH_BINARY_INV)

# Creating a 3x3 kernel
kernel=np.ones((3,3), np.uint8)
# Performing dilation on the mask
dilation=cv2.dilate(mask, kernel)

# Plotting the images
titles=["image","mask","dilation"]
images=[img, mask, dilation]

for i in range(3):
  plt.subplot(1, 3, i+1), plt.imshow(images[i], "gray")
  plt.title(titles[i])
  plt.xticks([]),plt.yticks([])
plt.show[]

Output:

This image has an empty alt attribute; its file name is 24.jpg

How to detect faces in OpenCV?

If you have any type of camera that does face detection it is probably using Haar feature-based cascade classifier for object detection. Now we are going to learn what these Haar cascade classifiers are and how to use them to detect faces, eyes and smiles.

There are several problems with face detection that we need to solve. We are often dealing with a high-resolution image, we do not know the size of the face on the image, and we do not know how many faces are there in the image. Moreover, we need to consider different ethnic or age groups, people with beard, or people with glasses on. So, when it comes to face detection it is very difficult to obtain accurate and quick results.

But thanks to Viola and Jones, this is not a big problem anymore. They came up with Haar Cascade (Viola- Jones algorithm) – a machine learning object detection algorithm that can be used to identify objects in images or videos. It consists of many simple features called Haar features which are used to determine whether the object (face, eyes) is present in the image/video or not.

This image has an empty alt attribute; its file name is Picture6-2-1024x372.jpg

Now, let us see how we can implement Haar cascade classifiers using Python.

Before you start programming, be sure to download the following three files from GitHub directory of Haar cascades, and load them into your Python script.

# Loading the image 
img = cv2.imread("emily_clark.jpg")
In the following lines of the code we call face_cascade classifier.
face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml") 

Next, we need to convert our image into grayscale because Haar cascades work only on gray images. So, we are going to detect faces, eyes and smiles in grayscale images, but we will draw rectangles around the detected faces in the color images.

As the first step we will detect the face. To extract coordinates of a rectangle that we are going to draw around the detected face, we need to create object faces. In this object we are going to store our detected faces. With a function detectMultiScale() we will obtain tuple of four elements: x and y are coordinates of a top left corner, and w and h are width and height of the rectangle. This function requires several arguments. The first one is the gray image, the input image on which we will detect faces. The second argument is the scale factor which tells us how much the image size is reduced at each image scale. The third and the last argument is the minimal number of neighbors. This parameter specifies how many neighbors each candidate rectangle should have.

# Creating an object faces
faces= face_cascade.detectMultiScale (gray, 1.1, 10)
# Drawing rectangle around the face
for(x , y,  w,  h) in faces:
  cv2.rectangle(img, (x,y) ,(x+w, y+h), (0,255,0), 3)
cv2_imshow(img)

Output:

This image has an empty alt attribute; its file name is download-20.png

How to detect facial landmarks?

Face Swapping pictures are an extremely popular trend on social media. Snapchat, Cupace, MSQRD are probably the most widely used apps having the face swapping option. In a few seconds you can easily swap your face with your friend’s face or with some funny features. However, although face swapping seems to be very simple, it is not an easy task. Now you may wonder “how those apps can perform such advanced face swapping”?

To perform face swapping, we cannot just crop one face and replace it with another. What we need to do is to localize the key points that describe the unique location of a facial component in an image (eyes, nose eyebrows, mouth, jawline etc.). To do this, we need to develop a shape prediction method that identifies important facial structures. In our code we are going to implement a method developed by two Swedish Computer Vision researchers Kazemi and Sullivan in 2014, called One Millisecond Face Alignment with an Ensemble of Regression Trees. This detector detects facial landmarks very quickly and accurately. To better understand this method, have a look at the following image.

This image has an empty alt attribute; its file name is landmarks-1.jpg

In this picture you can see a training set of 68 labeled facial points with specific coordinates that surround certain parts of the face.

To detect facial landmarks, we are going to use the dlib library. First, we will detect the face in the input image. Then, we will use the same method to detect the facial landmarks. This is how our final result will look like.

Output:

This image has an empty alt attribute; its file name is download-7.png

We can clearly see that the small red circles are mapped to the specific facial features on the face (eyes, nose, mouth, eyebrows, jawline).

How to align faces with OpenCV?

Face alignment is one important step that we need to master before we start to work on some more complicated image processing tasks in Python. Face alignment can be realized as a process of transforming different sets of points from input images (input coordinate systems) into one coordinate system. We can call this coordinate system as the output coordinate system and define it as our stationary reference frame. Our goal is to warp and transform all input coordinates and align them with the output coordinates. For this purpose, we will apply three basic affine transformations: rotation, translation and scaling. In this way, we can transform facial landmarks from the input coordinate systems to the output coordinate system.

After detecting faces and eyes with Haar cascade classifiers we need to draw a line between the center points of two eyes. But, before we do that, we need to calculate the coordinates of the central points of the rectangles. For better visualization, take a look at the following example.

This image has an empty alt attribute; its file name is 31-2-1024x720.jpg

The next step will be to draw a horizontal line and calculate the angle between that line and the line that connects two central points of the eyes. Our goal is to rotate the image based on this angle. To calculate the angle, we first need to find the length of the legs of a right triangle. Then we can find the required angle using the following formula.

This image has an empty alt attribute; its file name is 28-1-1024x662.jpg

Now, we can finally rotate our image by an angle theta.

Output:

This image has an empty alt attribute; its file name is download-16.png

Now, we need to scale our image, for which we will use the distance between the eyes in this image as a reference frame. But, we first need to calculate this distance. We already calculated the length of two sides in the right triangle. So, we can use the Pythagorean theorem to calculate the distance between the eyes as it represents the hypotenuse. We can do the same thing with all the other pictures that we are processing with this code. After that, we can calculate the ratio of these results and scale our images based on that ratio.

Output:

Original images

This image has an empty alt attribute; its file name is 32-1-1024x455.jpg

Aligned images

This image has an empty alt attribute; its file name is 33-1-1024x381.jpg

As you can see we obtained excellent results.

Summary

After reading this post, you have gained basic knowledge that will help you in more complicated image processing tasks. If you want to learn Computer Vision with full understanding of complicated code and mathematical formulas check out this book which will give you all the answers that you need.