Efficient text detection and extraction Method/Framework

, frenz i am interested in video content analysis. Now i want some idea regarding text detection and extraction from video. (Note:- Text means Overlay text not scene text). Some what similar to video OCR. Based on observation that there exist transient colors between inserted text and its adjacent background.

Overview of the scenario:- [IMG]

we propose a new overlay text detection and extraction method using the transition region between the overlay text and background.
First, we generate the transition map based on our observation that there exist transient colors between overlay text and its adjacent background. Then the overlay text regions are roughly detected by computing the density of transition pixels and the consistency of texture around the transition pixels. The detected overlay text regions are localized accurately using the projection of transition map with an improved color-based thresholding method to extract text strings correctly.

Since the change of intensity at the boundary of overlay text may be small in the low contrast image, to effectively determine whether a pixel is within a transition region, the modified saturation is first introduced as a weight value based on the fact that overlay text is in the form of overlay graphics. The modified saturation is defined as follows:

S(x, y) =1-(3/(R+G+B)[min(R,G,B)])  1
~S(x, y) = S(x, y)/max(S(x, y))
Max(S(x, y)) =2*(0.5-I(x, y)), if ~I(x, y)>0.5 2
Max(S(x, y)) =I(x, y)), Otherwise. 2

S(x, y) and Max(S(x, y)) denote the saturation value and the maximum saturation value at the corresponding intensity level, respectively~I(x, y). denotes the intensity at the (x, y), which is normalized to [0,1] . Based on the conical HSI color model , the maximum value of saturation is normalized in accordance with ~I(x, y) compared to 0.5 in (2). The transition can thus be defined by combination of the change of intensity and the modified saturation as follows:

DL(x, y) = (1+dSL(x, y)) * |I(x-1, y) - I(x, y)|
DH(x, y) = (1+dSH(x, y)) * |I(x, y) - I(x+1, y)|
Where dSL(x, y) = |~S(x-1, y)-~S(x, y)| and
dSH(x, y)= |~S(x, y)-~S(x+1,y)| 3

Since the weight dSH(x, y)) and dSL(x, y)) can be zero by the achromatic overlay text and background, we add 1 to the weight in (3). If a pixel satisfies the logarithmical change constraint given in (4), three consecutive pixels centered by the current pixel are detected as the transition pixels and the transition map is generated

T(x, y) = 1, if DH > DL+TH
T(x, y) = 0, Otherwise. 4

The thresholding value TH is empirically set to 80 in consideration of the logarithmical change.