What is OpenCV?
OpenCV is the most popular library that is used for image processing in Python. It is used to detect and recognize faces, identify objects, classify human actions in videos, track movement with camera and many others. The goal of this post is to sprint you through the fundamentals of digital image processing using OpenCV in Python. So, let’s start our journey.
What is a pixel and image?
The definition of an image is very simple: it is a two-dimensional view of a 3D world. Furthermore, a digital image is a numeric representation of a 2D image as a finite set of digital values. We call these values pixels and they collectively represent an image. Basically, a pixel is the smallest unit of a digital image (if we zoom in a picture, we can detect them as miniature rectangles close to each other) that can be displayed on a computer screen. For a more detailed explanation check out our post on how to access and edit pixel values in OpenCV.
A digital image is represented in your computer as a matrix of pixels. Each pixel is stored as an integer number. If we are dealing with a grayscale image, we are using values from 0 (black pixels) up to 255 (white pixels). Any number in between these two is a shade of gray. On the other hand, color images are represented with three matrices. Each of those matrices represent one primary color which is also called a channel. The most common color model is the Red, Green, Blue (RGB). These three colors are mixed together to generate a broad range of colors. Note that OpenCV loads the color images in reverse order so that the blue channel is the first one, the green channel is the second, and the red channel is the third (BGR). We can see the order of the channels in the following diagram:
Due to this, we might have somewhat of a problem because other Python packages (e.g. matplotlib) use the RGB color format. This is why it is very important to know how to convert an image from one format into another. Here is one way how it can be done in a fancy way.
# Necessary imports import cv2 import numpy as np import matplotlib.pyplot as plt # For Google Colab we use the cv2_imshow() function from google.colab.patches import cv2_imshow # We load the image using the cv2.imread() function # Function loads the image in BGR order img = cv2.imread("Benedict.jpg",1) cv2_imshow(img)
However, when we plot our image with matplotlib due to the fact that it uses the RGB color order, colors in our displayed image will be reversed.
Now, we have two images. The first one is our original image. In addition, we have also created the second one (img1) where we split the original image into 3 channels and then merge them back together into the RGB order. Next, we are going to plot both images, first with OpenCV and then with matplotlib.
# We can split the our image into 3 three channels (b, g, r) b, g, r = cv2.split(img) # Next, we merge the channels in order to build a new image img1 = cv2.merge([r, g, b])
So, now, for img1, matplotlib works properly, but for the OpenCV we got reversed colors. Luckily, it is easy for us visually to inspect whether the colors are displayed correctly.
First, let’s see what a video is. It is actually a sequence of images which gives the appearance of motion. Videos can be seen as a collection of images (frames). For more information check out the following link.
In a video processing we can also change and modify its colors. In the following example we will split every color frame into three color channels. They are stored as matrices with the same height and the width as our original video. Then, by changing and modifying its values we can create a fun visual effect.
# Creating the VideoCapture object cap = cv2.VideoCapture("Video.mp4") ret, frame=cap.read() # Define the codec and create VideoWriter object fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V') out = cv2.VideoWriter('output2.mp4', fourcc, 10, (640,360)) # Splitting our channels (b, g, r)=cv2.split(frame) for i in range(100): # In every iteration increase a blue channel pixel value for 1. b = b+1 frame = cv2.merge([b, g ,r ] ) out.write(frame) out.release()
How to write text on the image?
Writing text on images is absolutely essential for image processing applications and projects. So let’s learn how to do this.
First, we need several parameters. Second, we need to define our font type. Once we do that, we are going to use the function cv2.putText(). After adding some arguments like the starting point ( provided text string stars at the upper-left corner), size, color and the line type of our text, we are ready to go. Last provided parameter is line type. There are three available line types in OpenCV (cv2.LINE_4, cv2.LINE_8, cv2.LINE_AA).
Now, let’s have a lite fun and try to recreate our Datahacker.rs logo
# Creating our image img = np.zeros((400, 400, 3), dtype="uint8") img[:] = (255,255,255) # Creating "Datahacker" logo cv2.rectangle(img, (76,76), (324,300), (0,0,0), (2)) font=cv2.FONT_ITALIC font2=cv2.FONT_HERSHEY_PLAIN cv2.putText((img),"H A C K E R", (100,350), font, (1), (0,0,0), 2, cv2.LINE_AA) cv2.putText((img),"D", (120,162), font2, (5), (0,0,0), 5, cv2.LINE_AA) cv2.putText((img),"T", (120,264), font2, (5), (0,0,0), 5, cv2.LINE_AA) cv2.putText((img),"A", (240,164), font2, (5), (0,0,0), 5, cv2.LINE_AA) cv2.putText((img),"A", (240,264), font2, (5), (0,0,0), 5, cv2.LINE_AA) cv2_imshow(img)
If we want to create a GIF out of this logo, we will use the following code:
# Creating the images img = np.zeros((400, 400, 3), dtype="uint8") img[:] = (255,255,255) plt.imshow(img) cv2.rectangle(img, (76,76), (324,300), (0,0,0), (2)) font=cv2.FONT_ITALIC font2=cv2.FONT_HERSHEY_PLAIN cv2.putText((img),"D", (120,162), font2, (5), (0,0,0), 5, cv2.LINE_AA) plt.axis("off") # Plotting the image that we are created plt.imshow(img) # Saving an image that we are created # Dots per inch (dpi) is resolution that we chose plt.savefig("d_001.jpg", dpi = 800) cv2.putText((img),"A", (240,164), font2, (5), (0,0,0), 5, cv2.LINE_AA) plt.axis("off") plt.imshow(img) plt.savefig("d_002.jpg", dpi = 800) cv2.putText((img),"T", (120,264), font2, (5), (0,0,0), 5, cv2.LINE_AA) plt.axis("off") plt.imshow(img) plt.savefig("d_003.jpg", dpi = 800) cv2.putText((img),"A", (240,264), font2, (5), (0,0,0), 5, cv2.LINE_AA) plt.axis("off") plt.imshow(img) plt.savefig("d_004.jpg", dpi = 800) cv2.putText((img),"H A C ", (100,350), font, (1), (0,0,0), 2, cv2.LINE_AA) plt.axis("off") plt.imshow(img) plt.savefig("d_005.jpg", dpi = 800) cv2.putText((img),"H A C K E R", (100,350), font, (1), (0,0,0), 2, cv2.LINE_AA) plt.axis("off") plt.imshow(img)
The code for creating GIF animation you can find on the blog Datahacker.rs in the post How to draw lines, rectangles, circles and write text on images with OpenCV in Python?
How to flip an image with OpenCV?
So, we learned how to apply some basic operations in OpenCV, and now it is time to dive deeper to our first image processing technique. These techniques are fundamental tools for almost all areas of computer vision. In OpenCV we can use transformations like image resizing, image translation, image rotation and image flipping.
Let’s explore one of them. This is a transformation called flipping. This means that we are going to flip our image around the x or y. It is very simple. We just need to call cv2.flip() function and we need to provide only one argument. This argument is a value that will determine around which axes we will flip our image. A value 1 indicates that we are going to flip our image around the y-axis (horizontal flipping). On the other hand, a value 0 indicates that we are going to flip the image around the x-axis (vertical flipping). If we want to flip the image around both axes, we will use a negative value (e.g. -1).
A matrix that we are using to perform this operation in Linear Algebra is called a reflection matrix. For flipping operations in Python this matrix is not required, but it is good to know how it looks like.
# Fliping the image around y-axis flipped = cv2.flip(img, 1) cv2_imshow(flipped) # Fliping the image around x-axis flipped = cv2.flip(img, 0) cv2_imshow(flipped) # Fliping the image around both axes flipped = cv2.flip(img, -1) cv2_imshow(flipped)
How to make sepia, emboss filter or other Instagram filters?
When we want to blur or sharpen our image, we need to apply a linear filter. There are several types of filters that we often use in image processing. In this post we will show how to create very interesting filter effects that are widely popular on social media networks such as Instagram.
Processes for creating a sepia effect includes the following steps. Red, green and blue channels are multiplied with certain coefficients and they create new, updated values for the red, green and blue. Then, we split our channels. After we processed them, we obtained new channels for red, green and blue. Once this is finished, we create our final output image by merging channels in new red, new green and new blue. Finally, we plot our image and we can see the sepia effect that we have created. It resembles an old brown pigment that was widely used by photographers back in the early days of photography.
# Splitting the channels (b,g,r)=cv2.split(img1) r_new = r*0.393 + g*0.769 + b*0.189 g_new = r*0.349 + g*0.686 + b*0.168 b_new = r*0.272 + g*0.534 + b*0.131 img_new=cv2.merge([b_new, g_new, r_new]) cv2_imshow(img_new)
Here we will illustrate another interesting filtering method called emboss. Notice that here, after filtering our image, we will get the difference image that is very low in pixel intensity values. This means that the output image will be rather black. Therefore, we will add a constant of 128 and we will obtain a resulting image in more grayish colors. This example demonstrates how we can create an emboss effect.
When we want to remove the noise in images we can use morphological transformations. There are two basic morphological transformations and they are called dilation and erosion. They are used in image processing within various applications. In addition, once we learn these two basic morphological operations we can combine them to create additional operations like opening and closing.
Dilation is exactly what it sounds like. It is an addition (expansion) of bright pixels of the object in a given image. So, how can we dilate or expand the image? We just need to perform the convolution on our input image with the kernel. The kernel is defined with respect to the anchor point which is usually placed at the central pixel.
In the Figure below, we have our input image (left image) and the kernel (middle image). We are going to take our kernel and run it across the whole image in order to calculate a local maximum for each position of the kernel. This local maximum we will store in the output image.
So, let’s see how we can implement this in OpenCV. As the first step, we will load our input image and we need to threshold it in order to create a binary image. In this post you can find a detailed explanation of a thresholding. Also if you like the illustrations in this post check out the book called “The hundred-page Computer Vision OpenCV book”. This book will help you to master Computer Vision very quickly.
# Loading an input image and performing thresholding img = cv2.imread('Billiards balls 1.jpg', cv2.IMREAD_GRAYSCALE) _, mask=cv2.threshold(img, 230, 255, cv2.THRESH_BINARY_INV) # Creating a 3x3 kernel kernel=np.ones((3,3), np.uint8) # Performing dilation on the mask dilation=cv2.dilate(mask, kernel) # Plotting the images titles=["image","mask","dilation"] images=[img, mask, dilation] for i in range(3): plt.subplot(1, 3, i+1), plt.imshow(images[i], "gray") plt.title(titles[i]) plt.xticks(),plt.yticks() plt.show
How to detect faces in OpenCV?
If you have any type of camera that does face detection it is probably using Haar feature-based cascade classifier for object detection. Now we are going to learn what these Haar cascade classifiers are and how to use them to detect faces, eyes and smiles.
There are several problems with face detection that we need to solve. We are often dealing with a high-resolution image, we do not know the size of the face on the image, and we do not know how many faces are there in the image. Moreover, we need to consider different ethnic or age groups, people with beard, or people with glasses on. So, when it comes to face detection it is very difficult to obtain accurate and quick results.
But thanks to Viola and Jones, this is not a big problem anymore. They came up with Haar Cascade (Viola- Jones algorithm) – a machine learning object detection algorithm that can be used to identify objects in images or videos. It consists of many simple features called Haar features which are used to determine whether the object (face, eyes) is present in the image/video or not.
Now, let us see how we can implement Haar cascade classifiers using Python.
Before you start programming, be sure to download the following three files from GitHub directory of Haar cascades, and load them into your Python script.
# Loading the image img = cv2.imread("emily_clark.jpg") In the following lines of the code we call face_cascade classifier. face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
Next, we need to convert our image into grayscale because Haar cascades work only on gray images. So, we are going to detect faces, eyes and smiles in grayscale images, but we will draw rectangles around the detected faces in the color images.
As the first step we will detect the face. To extract coordinates of a rectangle that we are going to draw around the detected face, we need to create object faces. In this object we are going to store our detected faces. With a function detectMultiScale() we will obtain tuple of four elements: x and y are coordinates of a top left corner, and w and h are width and height of the rectangle. This function requires several arguments. The first one is the gray image, the input image on which we will detect faces. The second argument is the scale factor which tells us how much the image size is reduced at each image scale. The third and the last argument is the minimal number of neighbors. This parameter specifies how many neighbors each candidate rectangle should have.
# Creating an object faces faces= face_cascade.detectMultiScale (gray, 1.1, 10) # Drawing rectangle around the face for(x , y, w, h) in faces: cv2.rectangle(img, (x,y) ,(x+w, y+h), (0,255,0), 3) cv2_imshow(img)
How to detect facial landmarks?
Face Swapping pictures are an extremely popular trend on social media. Snapchat, Cupace, MSQRD are probably the most widely used apps having the face swapping option. In a few seconds you can easily swap your face with your friend’s face or with some funny features. However, although face swapping seems to be very simple, it is not an easy task. Now you may wonder “how those apps can perform such advanced face swapping”?
To perform face swapping, we cannot just crop one face and replace it with another. What we need to do is to localize the key points that describe the unique location of a facial component in an image (eyes, nose eyebrows, mouth, jawline etc.). To do this, we need to develop a shape prediction method that identifies important facial structures. In our code we are going to implement a method developed by two Swedish Computer Vision researchers Kazemi and Sullivan in 2014, called One Millisecond Face Alignment with an Ensemble of Regression Trees. This detector detects facial landmarks very quickly and accurately. To better understand this method, have a look at the following image.
In this picture you can see a training set of 68 labeled facial points with specific coordinates that surround certain parts of the face.
To detect facial landmarks, we are going to use the dlib library. First, we will detect the face in the input image. Then, we will use the same method to detect the facial landmarks. This is how our final result will look like.
We can clearly see that the small red circles are mapped to the specific facial features on the face (eyes, nose, mouth, eyebrows, jawline).
How to align faces with OpenCV?
Face alignment is one important step that we need to master before we start to work on some more complicated image processing tasks in Python. Face alignment can be realized as a process of transforming different sets of points from input images (input coordinate systems) into one coordinate system. We can call this coordinate system as the output coordinate system and define it as our stationary reference frame. Our goal is to warp and transform all input coordinates and align them with the output coordinates. For this purpose, we will apply three basic affine transformations: rotation, translation and scaling. In this way, we can transform facial landmarks from the input coordinate systems to the output coordinate system.
After detecting faces and eyes with Haar cascade classifiers we need to draw a line between the center points of two eyes. But, before we do that, we need to calculate the coordinates of the central points of the rectangles. For better visualization, take a look at the following example.
The next step will be to draw a horizontal line and calculate the angle between that line and the line that connects two central points of the eyes. Our goal is to rotate the image based on this angle. To calculate the angle, we first need to find the length of the legs of a right triangle. Then we can find the required angle using the following formula.
Now, we can finally rotate our image by an angle theta.
Now, we need to scale our image, for which we will use the distance between the eyes in this image as a reference frame. But, we first need to calculate this distance. We already calculated the length of two sides in the right triangle. So, we can use the Pythagorean theorem to calculate the distance between the eyes as it represents the hypotenuse. We can do the same thing with all the other pictures that we are processing with this code. After that, we can calculate the ratio of these results and scale our images based on that ratio.
As you can see we obtained excellent results.
After reading this post, you have gained basic knowledge that will help you in more complicated image processing tasks. If you want to learn Computer Vision with full understanding of complicated code and mathematical formulas check out this book which will give you all the answers that you need.
This is a guest post written by Prof. Dr. Vladimir Matic, Strahinja Zivkovic and Strahinja Stefanovic from dataHacker.rs blog. What is OpenCV? OpenCV is the most popular library that is used for image processing in Python. It is used to detect and recognize faces, identify objects, classify human actions in videos, track movement with camera and many others. The goal of this post is to sprint you through the fundamentals of digital image processing using OpenCV in Python. So, let's start our
Convolutional Neural Networks are probably the reason why the fields of machine learning, deep learning and AI are so popular today. They are awesome and what they are doing certainly seemed like a black magic a couple of years ago. At its core, we can find the convolution process. This process is used for making detecting features of the images and uses this information for classification. To be more precise, here is how the complete process of Convolutional Neural Networks
14th of April was an important date for the Balkan AI community. For the first time, representatives of 8 cities of this region (including Ljubljana) gathered at a Balkan AI webinar to discuss How AI will help the world's recovery after the pandemic of Coronavirus? More than 150 people attended the event which was organized by the CITY AI branch in the Balkans and Serbian AI Society. All the challenges that pandemic faced us were analyzed from an AI perspective, through
Have you ever wondered what society would look like if we could predict diseases or natural disasters? How many people really need help that can not be given only by humans? Artificial intelligence can help - it has the huge potential to prevent big social, environmental and humanitarian problems. In order to achieve this potential, it has to be accessible to organizations that are already influential in these areas! Among many uses of AI, we have highlighted the best cases that show
#StayHome is probably the most popular hashtag you’ll see on social media these days. As social distancing becomes common practise because of the COVID-19, we found some great ways to make the leisure time fun and useful! Do you want to learn more about what AI really means? Due to the fact that AI is one of the biggest human’s support in fighting against the Coronavirus, we are sure that you would love to learn more about how AI really works and
The beginning of 2020 will definitely be remembered by the world, which is almost completely stopped because of an increasing enemy, already well known as COVID-19. The number of infected people is changing hour by hour, but we’re not here to repeat all those media reports that surround us! On the contrary, we will focus on the world’s best defense forces: worldwide excellent medical workers and clever machines included in this hard fight! Increasing Coronavirus improved the professionalism of so many
The last decade was full of scientific and technological achievements, that might have made you think everything has already been seen. But don’t be tricked, it is just the beginning of an amazing future filled with technological wonders that will give our lives a new dimension. At the start of 2020, we can claim with certainty that artificial intelligence will play an essential role in our technological future, which is coming faster than you think! But who are the main creators
Do you know what’s new about the future? It’s not so uncertain anymore. We can clearly see the shapes of the world’s future reflected in the innovative tendencies of the world’s major companies, our daily life activities, and the optimism of genius AI enthusiasts. Even if artificial intelligence is “the 21-century thing”, there are some important persons in history who also believed a hundred years ago that the future belongs to us. We are giving you 10 inspirational quotes that can