OpenCV: Find columns in Arabic journals (Python)-Python Tutorial-php.cn

Table of Contents

Correct answer

Home

Backend Development

Python Tutorial

OpenCV: Find columns in Arabic journals (Python)

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Feb 22, 2024 pm 12:52 PM

overflow

OpenCV: Find columns in Arabic journals (Python)

Question content

I am new to opencv and new to python. I tried to piece together code I found online to solve my research problem. I have an Arabic diary from 1870 that has hundreds of pages, each page contains two columns and has a thick black border. I want to extract two columns as image files so that I can run ocr on them individually while ignoring the header and footer. Here is an example page:

Page 3

I have ten pages of raw prints as separate png files. I wrote the following script to handle each one. It works as expected in 2 of the 10 pages, but fails to generate the columns in the other 8 pages. I don't understand all the functions well enough to know where I could use these values, or if my entire approach is misguided - I think the best way to learn is to ask the community how you would solve this problem.

import cv2

def cutpage(fname, pnum):
    image = cv2.imread(fname)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (7,7), 0)
    thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 13))
    dilate = cv2.dilate(thresh, kernel, iterations=1)
    dilatename = "temp/dilate" + str(pnum) + ".png"
    cv2.imwrite(dilatename, dilate)
    cnts = cv2.findContours(dilate, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    cnts = sorted(cnts, key=lambda x: cv2.boundingRect(x)[0])

    fullpage=1
    column=1
    for c in cnts:
        x, y, w, h = cv2.boundingRect(c)
        if h > 300 and w > 20:
            if (h/w)<2.5:
                print("Found full page: ", x, y, w, h)
                filename = "temp/p" + str(pnum) + "-full" + str(fullpage) + ".png"
                fullpage+=1
            else:
                print("Found column: ", x, y, w, h)
                filename = "temp/p" + str(pnum) + "-col" + str(column) + ".png"
                column+=1
            roi = image[y:y+h, x:x+w]
            cv2.imwrite(filename, roi)
    return (column-1)
        
for nr in range(10):
    filename = "p"+str(nr)+".png"
    print("Checking page", nr)
    diditwork = cutpage(filename, nr)
    print("Found", diditwork, "columns")

Copy after login

Following the tutorial, I created a binary inversion of blur and dilation so that it could identify different rectangular areas by large white areas. I also saved a copy of each extended version so I could see what it looked like, here's the page above after processing:

Page 3 has been enlarged

The "for c in cnts" loop should find large rectangular areas in the image. If the aspect ratio is less than 2.5 I get a full page (without header and footer, which works fine), if the aspect ratio is greater than this I know it's a column and it saves this e.g. temp/ p2-col2.png

I get some nice full pages without headers and footers, that is, just large black borders, but not chopped into columns. In 2 pages out of 10 I got what I wanted, which is:

Success Column on Page 2

Since I sometimes get the desired results, there must be something working, but I don't know how to improve it further.

edit:

Here are more page examples:

Correct answer

I tried something without any expansion because I wanted to see if I could just use the middle line as a "separator" . This is the code:

im = cv2.cvtcolor(cv2.imread("arabic.png"), cv2.color_bgr2rgb) # read im as rgb for better plots
gray = cv2.cvtcolor(im, cv2.color_rgb2gray) # convert to gray
_, threshold = cv2.threshold(gray, 250, 255, cv2.thresh_binary_inv) # inverse thresholding
contours, _ = cv2.findcontours(threshold, cv2.retr_external, cv2.chain_approx_none) # find contours
sortedcontours = sorted(contours, key = cv2.contourarea, reverse=true) # sort according to area, descending
bigbox = sortedcontours[0] # get the contour of the big box
middleline = sortedcontours[1] # get the contour of the vertical line
xmiddleline, _, _, _ = cv2.boundingrect(middleline) # get x coordinate of middleline
leftboxcontour = np.array([point for point in bigbox if point[0, 0] < xmiddleline]) # assign left of line as points from the big contour
rightboxcontour = np.array([point for point in bigbox if point[0, 0] >= xmiddleline]) # assigh right of line as points from the big contour
leftboxx, leftboxy, leftboxw, leftboxh = cv2.boundingrect(leftboxcontour) # get properties of box on left
rightboxx, rightboxy, rightboxw, rightboxh = cv2.boundingrect(rightboxcontour) # get properties of box on right
leftboxcrop = im[leftboxy:leftboxy + leftboxh, leftboxx:leftboxx + leftboxw] # crop left 
rightboxcrop = im[rightboxy:rightboxy + rightboxh, rightboxx:rightboxx + rightboxw] # crop right
# maybe do you assertations about aspect ratio??
cv2.imwrite("right.png", rightboxcrop) # save image
cv2.imwrite("left.png", leftboxcrop) # save image

Copy after login

I'm not using any assertions about aspect ratio, so maybe this is still something you need to do..

Basically, the most important lines in this method are generating left and right contours based on x-coordinates. This is the final result I get:

There are still some black parts on the edges, but that shouldn't be a problem for ocr.

FYI: I'm using the following packages in jupyter:

import cv2
import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt

Copy after login

v2.0: Implemented using only large box detection:

So I did some dilation and the big box was easily detectable. I use a horizontal kernel to ensure that the vertical lines of the large box are always thick enough to be detected. However, I cannot solve the problem with the middle line as it is very thin... Nonetheless, here is the code for the above method:

im = cv2.cvtcolor(cv2.imread("1.png"), cv2.color_bgr2rgb) # read im as rgb for better plots
gray = cv2.cvtcolor(im, cv2.color_rgb2gray) # convert to gray
gray[gray<255] = 0 # added some contrast to make it either completly black or white
_, threshold = cv2.threshold(gray, 250, 255, cv2.thresh_binary_inv) # inverse thresholding
thresholddilated = cv2.dilate(threshold, np.ones((1,10)), iterations = 1) # dilate horizontally
contours, _ = cv2.findcontours(thresholddilated, cv2.retr_external, cv2.chain_approx_none) # find contours
sortedcontours = sorted(contours, key = cv2.contourarea, reverse=true) # sort according to area, descending
x, y, w, h = cv2.boundingrect(sortedcontours[0]) # get the bounding rect properties of the contour
left = im[y:y+h, x:x+int(w/2)+10].copy() # generate left, i included 10 pix from the right just in case
right = im[y:y+h, int(w/2)-10:w].copy() # and right, i included 10 pix from the left just in case
fig, ax = plt.subplots(nrows = 2, ncols = 3) # plotting...
ax[0,0].axis("off")
ax[0,1].imshow(im)
ax[0,1].axis("off")
ax[0,2].axis("off")
ax[1,0].imshow(left)
ax[1,0].axis("off")
ax[1,1].axis("off")
ax[1,2].imshow(right)
ax[1,2].axis("off")

Copy after login

These are the results, you can notice it's not perfect, but again, since your target is ocr, this shouldn't be a problem.

Please tell me if this works, if not I will rack my brain to find a better solution...

v3.0: A better way to get straighter images, which will improve the quality of ocr.

Inspired by my other answer here: answer. It makes sense to straighten the image so that the ocr has better results. Therefore, I used a four-point transform on the detected outer frame. This will straighten the image slightly and make the text more horizontal. This is the code:

im = cv2.cvtcolor(cv2.imread("2.png"), cv2.color_bgr2rgb) # read im as rgb for better plots
gray = cv2.cvtcolor(im, cv2.color_rgb2gray) # convert to gray
gray[gray<255] = 0 # added some contrast to make it either completly black or white
_, threshold = cv2.threshold(gray, 250, 255, cv2.thresh_binary_inv) # inverse thresholding
thresholddilated = cv2.dilate(threshold, np.ones((1,10)), iterations = 1) # dilate horizontally
contours, _ = cv2.findcontours(thresholddilated, cv2.retr_external, cv2.chain_approx_none) # find contours
largest_contour = max(contours, key = cv2.contourarea) # get largest contour
hull = cv2.convexhull(largest_contour) # get the hull
epsilon = 0.02 * cv2.arclength(largest_contour, true) # epsilon
pts1 = np.float32(cv2.approxpolydp(hull, epsilon, true).reshape(-1, 2)) # get the points
result = four_point_transform(im, pts1) # using imutils
height, width = result.shape[:2] # get the dimensions of the transformed image
left = result[:, 0:int(width/2)].copy() # from the beginning to half the width
right = result[:, int(width/2): width].copy() # from half the width till the end
fig, ax = plt.subplots(nrows = 2, ncols = 3) # plotting...
ax[0,0].axis("off")
ax[0,1].imshow(result)
ax[0,1].axvline(width/2)
ax[0,1].axis("off")
ax[0,2].axis("off")
ax[1,0].imshow(left)
ax[1,0].axis("off")
ax[1,1].axis("off")
ax[1,2].imshow(right)
ax[1,2].axis("off")

Copy after login

Has the following packages:

import cv2
import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt
from imutils.perspective import four_point_transform

Copy after login

As you can see from the code, this is a better approach, you can force the image to be centered and horizontal thanks to the four point transform. Furthermore, there is no need to include some overlap since the images are well separated. Here is an example for your reference:

The above is the detailed content of OpenCV: Find columns in Arabic journals (Python). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks ago By DDD

Where to find the Site Office Key in Atomfall

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7865

Java Tutorial

1649

CakePHP Tutorial

1407

Laravel Tutorial

1301

PHP Tutorial

1243

Related knowledge

Is H5 page production a front-end development? Apr 05, 2025 pm 11:42 PM

Yes, H5 page production is an important implementation method for front-end development, involving core technologies such as HTML, CSS and JavaScript. Developers build dynamic and powerful H5 pages by cleverly combining these technologies, such as using the <canvas> tag to draw graphics or using JavaScript to control interaction behavior.

How to customize the resize symbol through CSS and make it uniform with the background color? Apr 05, 2025 pm 02:30 PM

The method of customizing resize symbols in CSS is unified with background colors. In daily development, we often encounter situations where we need to customize user interface details, such as adjusting...

Why are the inline-block elements misaligned? How to solve this problem? Apr 04, 2025 pm 10:39 PM

Regarding the reasons and solutions for misaligned display of inline-block elements. When writing web page layout, we often encounter some seemingly strange display problems. Compare...

The latest price of Bitcoin in 2018-2024 USD Feb 15, 2025 pm 07:12 PM

Real-time Bitcoin USD Price Factors that affect Bitcoin price Indicators for predicting future Bitcoin prices Here are some key information about the price of Bitcoin in 2018-2024:

How to use the clip-path attribute of CSS to achieve the 45-degree curve effect of segmenter? Apr 04, 2025 pm 11:45 PM

How to achieve the 45-degree curve effect of segmenter? In the process of implementing the segmenter, how to make the right border turn into a 45-degree curve when clicking the left button, and the point...

How to achieve segmentation effect with 45 degree curve border? Apr 04, 2025 pm 11:48 PM

Tips for Implementing Segmenter Effects In user interface design, segmenter is a common navigation element, especially in mobile applications and responsive web pages. ...

How to control the top and end of pages in browser printing settings through JavaScript or CSS? Apr 05, 2025 pm 10:39 PM

How to use JavaScript or CSS to control the top and end of the page in the browser's printing settings. In the browser's printing settings, there is an option to control whether the display is...

How to compatible with multi-line overflow omission on mobile terminal? Apr 05, 2025 pm 10:36 PM

Compatibility issues of multi-row overflow on mobile terminal omitted on different devices When developing mobile applications using Vue 2.0, you often encounter the need to overflow text...

See all articles