Recently, I got an idea to analyze a specific type of graphic. Flags. I wanted to create a simple dataset of what colors the flags are made up of and in what proportions (colors) they occur. It sounds like a data science task, so I did it using Python.
I wrote a program where you specify a directory, and it gives you a JSON with information about each image file found in this directory. In this post, I will focus only on the image analysis itself.
To do it I needed two libraries:
First, I had to get the images from the directory.
import os
directory = "{your_directory_with_images}"
for dirpath, _, filenames in os.walk(directory):
filenames_to_process = len(filenames)
for f in filenames:
img_path = os.path.abspath(os.path.join(dirpath, f))
filename = os.path.basename(f)
colors = self.get_colors(img_path)
file_dict = self.create_image_entry(colors, filename)
collection.append(file_dict)
return collection
All the magic happens in the get_colors
function.
def get_colors(img_path):
img = Image.open(img_path).convert("RGBA")
return extcolors.extract_from_image(img, tolerance=33, limit=10)
We need to preprocess the image to a format that is supported by the extcolors
library. Every file in the directory needs to be converted using the open
method from the Image
class (PIL library). Then the file will be processed using the extract_from_image
function from the extcolors
library. Here we pass three arguments: image file, tolerance, and limit. First one is mandatory, next two are optional. tolerance
is used to group colors and give you a better visual representation. Even If you use a high quality image with a large resolution, there might a small artifacts that can mess up the results. I experimented a bit with this, and the 33 worked the best in my scenario. I also limited the extracted colors to 10. I didn’t want to calculate colors that had a tiny representation in the image.
Here’s the example result of one of the processed images:
([((255, 255, 255), 2427354), ((0, 53, 128), 1576486)], 4003840)
The result consists of two elements: array of tuples with colors and number of pixels, sum of pixels of which the image is composed. In the example above, we can see that the image has two colors and consists of 4003840 pixels.
My next step was to prepare a JSON with results.
def create_image_entry(colors, file_name):
total_pixels = colors[1]
image_dict = {'file_name': file_name}
image_colors = [{'colorCode': str(index), 'percent': round(color / total_pixels * 100)} for index, color in
colors[0] if round(color / total_pixels * 100) >= 1]
image_dict['img_colors'] = image_colors
return image_dict
For each image, I create a dictionary with two keys: file_name
and img_colors
.
The latter is an array of objects, and each of them has a color code in RGB format and a calculated percent value. This indicator shows how much of the image a color occupies. Below is the result JSON with all the information I wanted. In this case, the analysis concerns the Finland flag.
[
{
"file_name": "fi.png",
"img_colors": [
{"colorCode": "(255, 255, 255)", "percent": 61},
{"colorCode": "(0, 53, 128)", "percent": 39}]
}
]
Now it was easy to download all country flags, put them in one directory, and run the script for each of them. I made a website with a visualization of my results.