OpenCV Tutorial

Implement Color Quantization Using k-means Clustering in OpenCV

Color quantization using k-means clustering In this subsection, we will apply the k-means clustering algorithm to the problem of color quantization, which can be defined as the process of reducing the number of colors in an image. Color quantization is a critical point for displaying images on certain devices that can only display a limited number of colors (commonly due to memory restrictions). Therefore, a trade-off between the similarity and the reduction in the number of colors is usually necessary. This trade-off is established by setting the K parameter properly, as we will see in the next examples. In the script, we perform the k-means clustering algorithm to perform color quantization. In this case, each element of the data is composed of 3 features, which correspond to the B, G, and R values for each of the pixels of the image. Therefore, the key step is to transform the image into data this way: data = np.float32(image).reshape((-1, 3)) Here, image is the image we previously loaded. In this script, we performed the clustering procedure using several values of K (3, 5, 10, 20, and 40) in order to see how the resulting image changes. For example, if we want the resulting image with only 3 colors (K = 3), we must perform the following: 1. Load the BGR image: img = cv2.imread('landscape_1.jpg') 2. Perform color quantization using the color_quantization() function: color_3 = color_quantization(img, 3) 3. Show both images in order to see the results. The color_quantization() function performs the color quantization procedure: def color_quantization(image, k): """Performs color quantization using K-means clustering algorithm""" # Transform image into 'data': data = np.float32(image).reshape((-1, 3)) # print(data.shape) # Define the algorithm termination criteria (maximum number of iterations and/or required accuracy): # In this case the maximum number of iterations is set to 20 and epsilon = 1.0 criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 20, 1.0) # Apply K-means clustering algorithm: ret, label, center = cv2.kmeans(data, k, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS) # At this point we can make the image with k colors # Convert center to uint8: center = np.uint8(center) # Replace pixel values with their center value: result = center[label.flatten()] result = result.reshape(img.shape) return result In the previous function, the key point is to make use of cv2.kmeans() method. Finally, we can build the image with k colors replacing each pixel value with their corresponding center value. The output of this script can be seen in the next screenshot: The previous script can be extended to include an interesting functionality, which shows the number of pixels assigned to each center value. This can be seen in the script. The color_quantization() function has been modified to include this functionality: […]

Understanding k-menas Clustering in OpenCV: A Beginner Guide

k-means clustering OpenCV provides the cv2.kmeans() function, which implements a k-means clustering algorithm, which finds centers of clusters and groups input samples around the clusters. The objective of the k-means clustering algorithm is to partition (or cluster) n samples into K clusters where each sample will belong to the cluster with the nearest mean. The signature of the cv2.kmeans() function is as follows: retval, bestLabels, centers=cv.kmeans(data, K, bestLabels, criteria, attempts, flags[, centers]) data represents the input data for clustering. It should be of np.float32 data type, and each feature should be placed in a single column. K specifies the number of clusters required at the end. The algorithm-termination criteria are specified with the criteria parameter, which sets the maximum number of iterations and/or the desired accuracy. When these criteria are satisfied, the algorithm terminates. criteria is a tuple of three parameters, type, max_iterm, and epsilon: type: This is the type of termination criteria. It has three flags: cv2.TERM_CRITERIA_EPS: The algorithm stops when the specified accuracy, epsilon, is reached. cv2.TERM_CRITERIA_MAX_ITER: The algorithm stops when the specified number of iterations, max_iterm, is reached. cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER: The algorithm stops when any of the two conditions is reached. max_iterm: This is the maximum number of iterations. epsilon: This is the required accuracy. An example of criteria can be the following: criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 20, 1.0) In this case, the maximum number of iterations is set to 20 (max_iterm = 20) and the desired accuracy is 1.0 (epsilon = 1.0). The attempts parameter specifies the number of times the algorithm is executed using different initial labelings. The algorithm returns the labels that yield the best compactness. The flags parameter specifies how initial centers are taken. The cv2.KMEANS_RANDOM_CENTERS flag selects random initial centers in each attempt. The cv2.KMEANS_PP_CENTERS flag uses the k-means++ center initialization proposed by Arthur and Vassilvitskii (see k-means++: The Advantages of Careful Seeding (2007)). cv2.kmeans() returns the following: bestLabels: An integer array that stores the cluster indices for each sample centers: An array that contains the center for each cluster compactness: The sum of the squared distance from each point to their corresponding centers In this section, we will see two examples of how to use the k-means clustering algorithm in OpenCV. In the first example, an intuitive understanding of k-means clustering is expected to be achieved while, in the second example, k-means clustering will be applied to the problem of color quantization. Understanding k-means clustering In this example, we are going to cluster a set of 2D points using the k-means clustering algorithm. This set of 2D points can be seen as a collection of objects, which has been described using two features. This set of 2D points can be created and visualized with the script. The output of this script can be seen in the next screenshot: This set of 2D points consists of 150 points, created in this way: data = np.float32(np.vstack( (np.random.randint(0, 40, (50, 2)), np.random.randint(30, 70, (50, 2)), np.random.randint( This will represent the data for clustering. As previously mentioned, it should be of np.float32 type and each feature should be placed in a single column. […]

An Introduction to QR Code Detection in OpenCV

QR code detection To complete this chapter, we are going to learn how to detect QR codes in images. This way, QR codes can also be used as markers for our augmented reality applications. The cv2.detectAndDecode() function both detects and decodes a QR code in the image containing the QR code. The image can be grayscale or color (BGR). This function returns the following: An array of vertices of the found QR code is returned. This array can be empty if the QR code is not found. The rectified and binarized QR code is returned. The data associated with this QR code is returned. In the script, we make use of the aforementioned function to detect and decode QR codes. The key points are commented next. First, the image is loaded, as follows: image = cv2.imread("qrcode_rotate_45_image.png") Next, we create the QR code detector with the following code: qr_code_detector = cv2.QRCodeDetector() Then, we apply the cv2.detectAndDecode() function, as follows: data, bbox, rectified_qr_code = qr_code_detector.detectAndDecode(image) We check whether the QR code is found before decoding the data and show the detection by using the show_qr_detection() function: if len(data) > 0: print("Decoded Data : {}".format(data)) show_qr_detection(image, bbox) The show_qr_detection() function draws both the lines and the corners of the detected QR code: def show_qr_detection(img, pts): """Draws both the lines and corners based on the array of vertices of the found QR code""" pts = np.int32(pts).reshape(-1, 2) for j in range(pts.shape[0]): cv2.line(img, tuple(pts[j]), tuple(pts[(j + 1) % pts.shape[0]]), (255, 0, 0), 5) for j in range(pts.shape[0]):, tuple(pts[j]), 10, (255, 0, 255), -1) The output of the script can be seen in the next screenshot: In the preceding screenshot, you can see the rectified and binarized QR code (left), and the detected marker (right), with a blue border, and magenta square points highlighting the detection.

Recognizing Handwritten Digits Using K-nearest Neighbor in OpenCV

Recognizing handwritten digits using k-nearest neighbor We are going to see how to perform handwritten digit recognition using the kNN classifier. We will start with a basic script that achieves an acceptable accuracy, and we will modify it to increase its performance. In these scripts, the training data is composed of handwritten digits. Instead of having many images, OpenCV provides a big image with handwritten digits inside. This image has a size of 2,000 x 1,000 pixels. Each digit is 20 x 20 pixels. Therefore, we have a total of 5,000 digits (100 x 50): In the script, we are going to perform our first approach trying to recognize digits using the kNN classifier. In this first approach, we will use raw pixel values as features. This way, each descriptor will be a size of 400 (20 x 20). The first step is to load all digits from the big image and to assign the corresponding label for each digit. This is performed with the load_digits_and_labels() function: digits, labels = load_digits_and_labels('digits.png') The code for the load_digits_and_labels() function is as follows: def load_digits_and_labels(big_image): """Returns all the digits from the 'big' image and creates the corresponding labels for each image""" # Load the 'big' image containing all the digits: digits_img = cv2.imread(big_image, 0) # Get all the digit images from the 'big' image: number_rows = digits_img.shape[1] / SIZE_IMAGE rows = np.vsplit(digits_img, digits_img.shape[0] / SIZE_IMAGE) digits = [] for row in rows: row_cells = np.hsplit(row, number_rows) for digit in row_cells: digits.append(digit) digits = np.array(digits) # Create the labels for each image: labels = np.repeat(np.arange(NUMBER_CLASSES), len(digits) / NUMBER_CLASSES) return digits, labels In the previous function, we first load the 'big' image and, afterwards, we get all the digits inside it. The last step of the previous function is to create the labels for each of the digits. The next step performed in the script is to compute the descriptors for each image. In this case, the raw pixels are the feature descriptors: # Compute the descriptors for all the images. # In this case, the raw pixels are the feature descriptors raw_descriptors = [] for img in digits: raw_descriptors.append(np.float32(raw_pixels(img))) raw_descriptors = np.squeeze(raw_descriptors) At this point, we split the data into training and testing (50% for each). Therefore, 2,500 digits will be used to train the classifier, and 2,500 digits will be used to test the trained classifier: partition = int(0.5 * len(raw_descriptors)) raw_descriptors_train, raw_descriptors_test = np.split(raw_descriptors, [partition]) labels_train, labels_test = np.split(labels, [partition]) Now, we can train the kNN model using knn.train() method and test it using get_accuracy() function: # Train the KNN model: print('Training KNN model - raw pixels as features') […]

Understanding K-nearest Neighbours (kNN) in OpenCV

k-nearest neighbor k-nearest neighbours (kNN) is considered one of the simplest algorithms in the category of supervised learning. kNN can be used for both classification and regression problems. In the training phase, kNN stores both the feature vectors and class labels of all of the training samples. In the classification phase, an unlabeled vector (a query or test vector in the same multidimensional feature space as the training examples) is classified as the class label that is most frequent among the k training samples nearest to the unlabeled vector to be classified, where k is a user-defined constant. This can be seen graphically in the next diagram: In the previous diagram, if k = 3, the green circle (the unlabeled test sample) will be classified as a triangle because there are two triangles and only one square inside the inner circle. If k = 5, the green circle will be classified as a square because there are three squares and only two triangles inside the dashed line circle. In OpenCV, the first step to work with this classifier is to create it. The method creates an empty kNN classifier, which should be trained using the train() method to provide both the data and the labels. Finally, the findNearest() method is used to find the neighbors. The signature for this method is as follows: retval, results, neighborResponses, dist=cv2.ml_KNearest.findNearest(samples, k[, results[, neighborResponses[, dist]]]) Here, samples is the input samples stored by rows, k sets the number of nearest neighbors (should be greater than one), results stores the predictions for each input sample, neighborResponses stores the corresponding neighbors, and dist stores the distances from the input samples to the corresponding neighbors. In this section, we will see two examples in order to see how to use the kNN algorithm in OpenCV. In the first example, an intuitive understanding of kNN is expected to be achieved, while in the second example, kNN will be applied to the problem of handwritten digit recognition. Understanding k-nearest neighbors The script carries out a simple introduction to kNN, where a set of points are randomly created and assigned a label (0 or 1). Label 0 will represent red triangles, while label 1 will represent blue squares. We will use the kNN algorithm to classify a sample point based on the k nearest neighbors. Hence, the first step is to create both the set of points with the corresponding label and the sample point to classify: # The data is composed of 16 points: data = np.random.randint(0, 100, (16, 2)).astype(np.float32) # We create the labels (0: red, 1: blue) for each of the 16 points: labels = np.random.randint(0, 2, (16, 1)).astype(np.float32) # Create the sample point to be classified: sample = np.random.randint(0, 100, (1, 2)).astype(np.float32) The next step is to create the kNN classifier, train the classifier, and find the k nearest neighbors: # k-NN creation: knn = # k-NN training: knn.train(data,, labels) # k-NN find nearest: k = 3 ret, results, neighbours, dist = knn.findNearest(sample, k) # Print results: print("result: {}".format(results)) print("neighbours: {}".format(neighbours)) print("distance: {}".format(dist)) […]

Implement Face Recognition with face_recognition Package: A Beginner Guide

face_recognition Face recognition with face_recognition uses the dlib functionality for both encoding the faces and calculating the distances for the encoded faces. Therefore, you do not need to code the face_encodings() and compare_faces() functions, but just make use of them. The script shows you how to create the 128D descriptor that makes use of the face_recognition.face_encodings() function: # Load image: image = cv2.imread("jared_1.jpg") # Convert image from BGR (OpenCV format) to RGB (face_recognition format): image = image[:, :, ::-1] # Calculate the encodings for every face of the image: encodings = face_recognition.face_encodings(image) # Show the first encoding: print(encodings[0]) To see how to compare faces using face_recognition, the script has been coded. The code is as follows: # Load known images (remember that these images are loaded in RGB order): known_image_1 = face_recognition.load_image_file("jared_1.jpg") known_image_2 = face_recognition.load_image_file("jared_2.jpg") known_image_3 = face_recognition.load_image_file("jared_3.jpg") known_image_4 = face_recognition.load_image_file("obama.jpg") # Crate names for each loaded image: names = ["jared_1.jpg", "jared_2.jpg", "jared_3.jpg", "obama.jpg"] # Load unknown image (this image is going to be compared against all the previous loaded images): unknown_image = face_recognition.load_image_file("jared_4.jpg") # Calculate the encodings for every of the images: known_image_1_encoding = face_recognition.face_encodings(known_image_1)[0] known_image_2_encoding = face_recognition.face_encodings(known_image_2)[0] known_image_3_encoding = face_recognition.face_encodings(known_image_3)[0] known_image_4_encoding = face_recognition.face_encodings(known_image_4)[0] known_encodings = [known_image_1_encoding, known_image_2_encoding, known_image_3_encoding unknown_encoding = face_recognition.face_encodings(unknown_image)[0] # Compare the faces: results = face_recognition.compare_faces(known_encodings, unknown_encoding) # Print the results: print(results) The results obtained are [True, True, True, False]. Therefore, the first three loaded images ("jared_1.jpg", "jared_2.jpg", and "jared_3.jpg") are considered to be the same person as the unknown image ("jared_4.jpg"), while the fourth loaded image ("obama.jpg") is considered to be a different person.

Implement Face Recognition with dlib Library: A Beginner Guide

Face recognition with dlib Dlib offers a high-quality face recognition algorithm based on deep learning. Dlib implements a face recognition algorithm that offers state-of-the-art accuracy. More specifically, the model has an accuracy of 99.38% on the labeled faces in the wild database. The implementation of this algorithm is based on the ResNet-34 network proposed in the paper Deep Residual Learning for Image Recognition (2016), which was trained using three million faces. The created model (21.4 MB) can be downloaded from ognition_resnet_model_v1.dat.bz2. This network is trained in a way that generates a 128-dimensional (128D) descriptor, used to quantify the face. The training step is performed using triplets. A single triplet training example is composed of three images. Two of them correspond to the same person. The network generates the 128D descriptor for each of the images, slightly modifying the neural network weights in order to make the two vectors that correspond to the same person closer and the feature vector from the other person further away. The triplet loss function formalizes this and tries to push the 128D descriptor of two images of the same person closer together, while pulling the 128D descriptor of two images of different people further apart. This process is repeated millions of times for millions of images of thousands of different people and finally, it is able to generate a 128D descriptor for each person. So, the final 128D descriptor is good encoding for the following reasons: The generated 128D descriptors of two images of the same person are quite similar to each other. The generated 128D descriptors of two images of different people are very different. Therefore, making use of the dlib functionality, we can use a pre-trained model to map a face into a 128D descriptor. Afterward, we can use these feature vectors to perform face recognition. The script shows how to calculate the 128D descriptor, used to quantify the face. The process is quite simple, as shown in the following code: # Load image: image = cv2.imread("jared_1.jpg") # Convert image from BGR (OpenCV format) to RGB (dlib format): rgb = image[:, :, ::-1] # Calculate the encodings for every face of the image: encodings = face_encodings(rgb) # Show the first encoding: print(encodings[0]) As you can guess, the face_encodings() function returns the 128D descriptor for each face in the image: pose_predictor_5_point = dlib.shape_predictor("shape_predictor_5_face_landmarks.dat") face_encoder = dlib.face_recognition_model_v1("dlib_face_recognition_resnet_model_v1.dat") detector = dlib.get_frontal_face_detector() def face_encodings(face_image, number_of_times_to_upsample=1, num_jitters=1): """Returns the 128D descriptor for each face in the image""" # Detect faces: face_locations = detector(face_image, number_of_times_to_upsample) # Detected landmarks: raw_landmarks = [pose_predictor_5_point(face_image, face_location) for face_location # Calculate the face encoding for every detected face using the detected landmarks for each one: return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, raw_landmark_set in raw_landmarks] As you can see, the key point is to calculate the face encoding for every detected […]

Implement Face Recognition with OpenCV: A Beginner Guide

Face recognition with OpenCV OpenCV provides support to perform face recognition ( 1/dd/d65/classcv_1_1face_1_1FaceRecognizer.html). Indeed, OpenCV provides three different implementations to use: Eigenfaces Fisherfaces Local Binary Patterns Histograms (LBPH) These implementations perform the recognition in different ways. However, you can use any of them by changing only the way the recognizers are created. More specifically, to create these recognizers, the following code is necessary: face_recognizer = cv2.face.LBPHFaceRecognizer_create() face_recognizer = cv2.face.EigenFaceRecognizer_create() face_recognizer = cv2.face.FisherFaceRecognizer_create() Once created, and independently of the specific internal algorithm OpenCV is going to use to perform the face recognition, the two key methods, train() and predict(), should be used to perform both the training and the testing of the face recognition system, and it should be noted that the way we use these methods is independent of the recognizer created. Therefore, it is very easy to try the three recognizers and select the one that offers the best performance for a specific task. Having said that, LBPH should provide better results than the other two methods when recognizing images in the wild, where different environments and lighting conditions are usually involved. Additionally, the LBPH face recognizer supports the update() method, where you can update the face recognizer given new data. For the Eigenfaces and Fisherfaces methods, this functionality is not possible. In order to train the recognizer, the train() method should be called: face_recognizer.train(faces, labels) The cv2.face_FaceRecognizer.train(src, labels) method trains the specific face recognizer, where src corresponds to the training set of images (faces), and parameter labels set the corresponding label for each image in the training set. To recognize a new face, the predict() method should be called: label, confidence = face_recognizer.predict(face) The cv2.face_FaceRecognizer.predict(src) method outputs (predicts) the recognition of the new src image by outputting the predicted label and the associated confidence. Finally, OpenCV also provides the write() and read() methods to save the created model and to load a previously created model, respectively. For both methods, the filename parameter sets the name of the model to save or load: cv2.face_FaceRecognizer.write(filename) As mentioned, the LBPH face recognizer can be updated using the update() method: cv2.face_FaceRecognizer.update(src, labels) Here, src and labels set the new training examples that are going to be used to update the LBPH recognizer.

Implement Face Tracking with the dlib DCF-based Tracker: A Beginner Guide

Face tracking with the dlib DCF- based tracker In the script, we perform face tracking using the dlib frontal face detector for initialization and the dlib DCF-based tracker DSST for face tracking. In order to initialize the correlation tracker, we execute the following command: tracker = dlib.correlation_tracker() This initializes the tracker with default values (filter_size = 6,num_scale_levels = 5, scale_window_size = 23, regularizer_space = 0.001, nu_space = 0.025, regularizer_scale = 0.001, nu_scale = 0.025, and scale_pyramid_alpha = 1.020). A higher value of filter_size and num_scale_levels increases tracking accuracy, but it requires more computational power, increasing CPU processing. The recommended values for filter_size are 5, 6, and 7, and for num_scale_levels, 4, 5, and 6. To begin tracking the method, tracker.start_track() is used. In this case, we perform face detection. If successful, we will pass the position of the face to this method, as follows: if tracking_face is False: gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Try to detect a face to initialize the tracker: rects = detector(gray, 0) # Check if we can start tracking (if we detected a face): if len(rects) > 0: # Start tracking: tracker.start_track(frame, rects[0]) tracking_face = True This way, the object tracker will start tracking what is inside the bounding box, which, in this case, is the detected face. Additionally, to update the position of the tracked object, the tracker.update() method is called: tracker.update(frame) This method updates the tracker and returns the peak-to-side-lobe ratio, which is a metric that measures how confident the tracker is. Larger values of this metric a metric that measures how confident the tracker is. Larger values of this metric indicate high confidence. This metric can be used to reinitialize the tracker with frontal face detection. To get the position of the tracked object, the tracker.get_position() method is called: pos = tracker.get_position() This method returns the position of the object being tracked. Finally, we can draw the predicted position of the face: cv2.rectangle(frame, (int(pos.left()), int(, (int(pos.right()), int(pos.bottom())) In this script, we coded the option to reinitialize the tracker if the number 1 is pressed. If this number is pressed, we reinitialize the tracker trying to detect a frontal face. To clarify how this script works, the following two screenshots are included. In the first screenshot, the tracking algorithm is waiting until a frontal face detection is performed to initialize the tracking: In the second screenshot, the tracking algorithm is currently tracking a previously detected face: In the previous screenshot you can see that the algorithm is currently tracking the detected face. You can also see that you can also press the number 1 in order to re-initialize the tracking.

1 2 3 8