Binarization of Ancient Document Images based on Multipeak Histogram Assumption 
	Fitri Arnia, Khairul Munadi 
	
			
		Abstract 
		
		In document binarization, text is segmented from the background. This is an important step, since the binarization outcome determines the success rate of the optical character recognition (OCR). In ancient documents, that are commonly noisy, binarization becomes more difficult. The noise can reduce binarization performance, and thus the OCR rate. This paper proposes a new binarization approach based on an assumption that the histograms of noisy documents consist of multipeaks. The proposed method comprises three steps: histogram calculation, histogram smoothing, and the use of the histogram to track the first valley and determine the binarization threshold. In our simulations we used a set of Jawi ancient document images with natural noises. This set is composed of 24 document tiles containing two noise types: show-through and uneven background. To measure performance, we designed and implemented a point compilation scheme. On average, the proposed method performed better than the Otsu method, with the total point score obtained by the former being 7.5 and that of the latter 4.5. Our results show that as long as the histogram fulfills the multipeak assumption, the proposed method can perform satisfactorily. 
 
	
			
		Keywords 
		
		multipeak histogram; image binarization; global thresholding; OCR; noisy document
		
		 
	
				
			
	
	
							
		
		DOI: 
http://doi.org/10.12928/telkomnika.v15i3.5105 	
Refbacks 
				There are currently no refbacks. 
	 
				
		This work is licensed under a 
Creative Commons Attribution-ShareAlike 4.0 International License .
	
TELKOMNIKA Telecommunication, Computing, Electronics and Control 1693-6930 , e-ISSN: 2302-9293 Universitas Ahmad Dahlan , 4th Campus+62  274 564604
<div class="statcounter"><a title="Web Analytics" href="http://statcounter.com/" target="_blank"><img class="statcounter" src="//c.statcounter.com/10241713/0/0b6069be/0/" alt="Web Analytics"></a></div>  View TELKOMNIKA Stats