Defence Science Journal, Vol. 64, No. 4, July 2014, pp. 350-357, DOI : 10.14429/dsj.64.4503
© 2014, DESIDOC
Received 18 April 2013, revised 20 April 2014, online published 21 July 2014
A Quaternionic Wavelet Transform-based Approach for Object Recognition
R. Ahila Priyadharshini*, andS. Arivazhagan
*Mepco Schlenk Engineering College, Sivakasi-626 005, India
*E-mail: ahilaprem@gmail.com
Recognizing the objects in complex natural scenes is the challenging task as the object may be occluded, may vary in shape, position and in size. In this paper a method to recognize objects from different categories of images using quaternionic wavelet transform (QWT) is presented. This transform separates the information contained in the image better than a traditional Discrete wavelet transform and provides a multiscale image analysis whose coefficients are 2D analytic, with one near-shift invariant magnitude and three phases. The two phases encode local image shifts and the third one contains texture information. In the domain of object recognition, it is often to classify objects from images that make only limited part of the image. Hence to identify local features and certain region of images, patches are extracted over the interest points detected from the original image using Wavelet based interest point detector. Here QWT magnitude and phase features are computed for every patch. Then these features are trained, tested and classified using SVM classifier in order to have supervised learning model. In order to compare the performance of local feature with global feature, the transform is applied to the entire image and the global features are derived. The performance of QWT is compared with discrete wavelet transform (DWT) and dual tree discrete wavelet transform (DTDWT). Observations revealed that QWT outperforms the DWT and shift invariant DTDWT with lesser equal error rate. The experimental evaluation is done using the complex Graz databases.
Keywords: Object recognition, salient point detector, patch, quaternionic wavelet transform
Object recognition is the concept of retrieving the information that is not apparent in images perceived. Normally, Humans recognize multiple images more clearly with little effort, despite their different positions, sizes and appearances. But it is still a challenging problem for computer vision systems. So, to endow a computer machine with a capacity of human beings, the domain of object recognition is needed. The object recognition has extended its applications in many areas such as image panoramas, image watermarking, global robot localization, face detection, optical character recognition, content-based image indexing, automated vehicle parking systems and visual positioning and tracking1.
Global features describe image as a whole and are less successful in recognition. Salient points are the points which maximize the discrimination between the objects. Salient point detection plays an important role in content based image retrieval in order to represent the local properties of the image. Since classic corner detectors cannot support natural images, detector based on wavelet transform represents global variations and local ones to detect the salient points2,3. Schmid & Mohr4 proposed local gray invariants for image retrieval, where local gray invariants are automatically extracted over the detected salient points. Weber5, et al. proposed the computation of K-means clustering algorithm at Forstner points for object recognition.
To recognize objects reliably under varying circumstances such as different scales, rotation and translation, the features to be chosen should be invariant with respect to these aspects. In complex images, the information provided by the global features is not sufficient and, therefore, they are not well suited. Hence, local features like patches are better suited for complex images, because they represent restricted regions of an image. Teynor6, et al. computed Gray values, Haar integral gray invariants and scale invariant feature transform (SIFT) for image patches over the interest points for visual object class recognition. To address the scale difference of the objects, the patches have to be extracted at different scales. Moreover, the occlusions of images can easily be handled by these patches7.
The dual-tree quaternion wavelet transform (QWT) is a new multiscale analysis tool for geometric image features. The QWT is a near shift-invariant tight frame representation whose coefficients support a magnitude and three phases: two phases encode local image shifts while the third contains image texture information. The QWT is based on an alternative theory for the 2-D Hilbert transform and can be computed using a dual-tree filter bank with linear computational complexity. To demonstrate the properties of the QWT’s coherent magnitude/phase representation, Chan8, et al. developed an efficient and accurate algorithm for estimating the local geometrical structure of images and a multiscale algorithm for estimating the disparity between a pair of images that is promising for image registration and flow estimation applications. QWT represents the local structures in images coherently based on a strong 2D signal processing theory; and sets some redundancy in a local phase rather than in directionality9. Raphel9, et al. used QWT for texture classification and obtained better classification accuracy. Sathyabama10, et al. classified rotated, scaled and translated texture images using two stage log polar quaternion wavelet energy signatures. Yin11, et al. used QWT to image denoising.
The main aim of this paper is to study the recent quaternionic wavelet transform and to apply it for generic object recognition task. A little amount of work has been done previously using QWT for various applications such as Texture classification, Image disparity estimation, color Texture Segmentation and Image denoising. But here, QWT is used for object recognition for recognizing various kinds of object categories from the complex Graz databases. Also, QWT’s potential and its practical superiority over standard DWT in a comparative object recognition task is presented. Further the performance of QWT is compared with another transform DTDWT which is having approximate shift invariance, directional selectivity and limited redundancy. Moreover, the exhaustive experiments, conducted on the above databases results in superior performance than other recent works reported in the literature.
The first step is to detect the salient points using the Wavelet based salient point detector. The salient points are found in the region of high variance. Then the patches are extracted around each of these detected salient points. For each and every patch, QWT, DWT and DTDWT features are computed in a separate manner and classified using SVM classifier and their performance are compared. To compare the performance of Local features with Global features, the QWT, DWT, and DTDWT are applied to the entire image and the statistical features are derived and classified. The block diagram of the proposed method is shown in Fig.1.
The standard DWT suffers from three drawbacks such as oscillations, shift-variance and no phase notion.
To overcome these drawbacks, the analytic signal modeling is embedded into the wavelet framework. The 1D analytic wavelet analysis with a 2 times redundant perfect reconstruction filter bank is achieved by dual tree complex wavelet transform (CWT)12. The CWT uses complex wavelet basis functions whose real and imaginary parts are 1D Hilbert transform pair. Since the real and imaginary wavelets are in quadrature, the CWT coefficient magnitudes are almost shift invariant with less redundancy. But 2D CWT produces an ambiguous phase failing to describe efficiently local features9.
Quaternionic wavelet transform is a CWT that provides a richer scale space analysis for 2D signals than DWT. The DWT coefficients are real whereas the QWT is quaternion valued i.e. 4-vectors made of one magnitude and a 3-angle phase. QWT separates the information better to describe more clearly the image content. QWT is based on the quaternionic Fourier transform (QFT) and the quaternionic analytic signal14, which extend the well-known signal theory concepts to 2D, by an embedding into the quaternion algebra H. The complex algebra C describes only 1D signal whereas quaternion algebra describes 2D signals well.
A quaternion is a generalization of complex number related to three imaginary units (i, j, k) following the rules
and
and can be written as
.In polar form,
, it is defined by one modulus
and three angles
called phase and are represented in Eqns. (1), (2) and (3) respectively9.
(1)
(2)
(3)
With
= 1.
The quaternionic analytic signal associated with a 2D function is defined by means of its partial
and total
Hilbert Transforms (HT) along the horizontal and vertical directions and is given in Eqn. (4)
(4)
In QWT, mother wavelet
is quaternionic 2D analytic filter, which yields the coefficients that are analytic and contrary to DWT that the magnitude is near shift-invariant16. Thus it inherits the magnitude-phase local analysis from the very useful 1D analytic signal. The usual interpretation of magnitude remains analogous to 1D, as it specifies the relative presence of a feature; whereas the local phase is represented by 3 angles carrying a complete description of this 2D feature.
(5)
Mathematically, 2D HT’s of separable functions (i.e,
) are equivalent to 1D HT’s along rows and/or columns. Considering 1D Hilbert pair of wavelets
and scaling function,
analytic 2D wavelets is written in terms of separable products. 1D HT operators are denoted along x and y coordinates by
and
and is given in Eqn. (6):
(6)
The four components in each quaternion wavelet basis are given in Eqn.(7)
(7)
Each sub band of the QWT can be viewed as the analytic signal associated with the narrow band of the image. The QWT magnitude represents features at any spatial position in each sub band and the three phases describe the structure of these features9
.3.1 Dual Tree Structure
The QWT uses the dual-tree algorithm15, a filter bank implementation that uses a Hilbert pair as a complex 1D wavelet, allowing shift invariance and analytic coefficients, which overcomes the problem of undecimated filter bank (i.e., The undecimated DWT is shift invariant but is not a tight frame, and have too high redundancy). Two complementary 1D filter sets, odd and even, lead to four 2D filter banks. The outputs from four 2D filter banks constitute one 4-valued quaternionic wavelet decomposition, which embeds the structural information into a local phase concept, rather than an oriented separation16.
Each quaternion wavelet consists of a standard DWT tensor wavelet plus three additional real wavelets obtained by 1-D Hilbert transforms along either or both coordinates. Figure 2 shows the decomposition and reconstruction of Quaternion Wavelet Transform11, where h0 and h1 are low-pass and high-pass filter of real wavelet; g0 and g1 are low-pass and high-pass filter, corresponding to Hilbert transform of h0 and h1, respectively; are synthesis filter.
The transformation of input data into set of features is called feature extraction. The reduced information that is set of features instead of full size input is used to recognize various complex images with better accuracy. QWT yields 4-vectors made of one magnitude and a 3-angle phase. QWT coefficients are formed by combining wavelet coefficients of same sub band from the output of each filter bank using quaternion algebra. In gray scale images, for every sub band, three HT pairs are formed in horizontal, vertical and diagonal directions. Sub band and their corresponding HT pairs are formed as a quaternion wavelet10. Figure 3. shows the two level decomposition of quaternionic wavelet transform of a sample image in Graz 01 database.
Here the combination of QWT magnitude and phase features is used for object recognition. QWT magnitude features are analogous to DWT features. Among the three phases, the phase is considered because it completes the structural information and conveys the texture information whereas and simply explain the spatial shifts of structures9.
4.1 QWT Magnitude-based Features
Statistical properties of the wavelet coefficients characterize the image well and lead to better image classification. In this paper, statistical features such as mean and standard deviation are derived from the wavelet coefficients of each sub band of QWT magnitude using Eqns. (8) and (9).
Mean
(8)
Standard deviation
(9)
where N is the number of pixels in the sub band and
is the wavelet coefficients in that sub band.
4.2 QWT Phase-based Feature
The standard deviation of the wavelet coefficients of each sub band of QWT phase
is calculated. To improve robustness, the
deviation can be weighted by the QWT magnitude. A high magnitude indicates an important presence of a feature while a low value means ‘no feature’ and also provides a numerically unstable phase. So the measure should be more representative by not considering the structure of low magnitude features9. The weight function W is the magnitude of the QWT coefficients normalized so that the sum within the sub band is one; and is integrated in the standard deviation formula as given in Eqn. (10).
Weighted Standard deviation
(10)
where
is the weighted mean of the sub band.
Here experiments are conducted with the images of complex Graz databases for the two class problem. Both object images and background images are used for training and testing. The task is to determine whether an object is present or not in a given image.
The Graz database itself comprises two complex databases namely Graz 01 and Graz 02 which contain different posed objects with cluttered backgrounds. There are 373 bike images, 460 person images, 210 both bike and person images and 270 mixed background images in Graz 01 database17. Likewise, there are 365 bike images, 420 car images, 311 person images and 380 compound background images which are used as positive and negative images respectively in case of Graz 02 database18. In Graz 02 database, images have objects with extreme variability in pose, orientation, lighting and different degrees of occlusion. All the images are color images in Graz databases. Here, for experimentation the images are converted into gray scale images. The sample images of Graz 01 and Graz 02 databases are shown in Figs. 4 and 5, respectively.
For computing local features, initially 200 salient points are detected from all the images of Graz databases using Wavelet based salient point detector3 and patches of size 32× 32 are extracted over the detected salient points.
The salient points are not confined to corners, but show variations that happen at different resolutions in the images2.The algorithm for detecting relevant salient points using Haar wavelet transform is given as follows:
- Calculate the wavelet representation of an image for all scales j=1/2,…,2-Jmaxand spatial orientations d=1,2,3, where Jmax= log2[min(m, n)], m and n are the width and height of an image.
- For each wavelet coefficient, find the maximum child coefficient.
- Track it recursively in finer resolutions.
- At the finer resolution (½), set the saliency value of the tracked pixel: the sum of the wavelet coefficients tracked.
- Choose the most prominent points based on the saliency value.
The most prominent salient points detected from sample images of Graz database is shown in Fig. 6.
For every patch, QWT magnitude and phase features are computed. A L-level decomposition using QWT and DWT provides 3L high frequency sub bands for analysis and a low-frequency sub band. In this proposed method, 3-level decomposition is done, so each and every patch image is decomposed into 9 numbers of high frequency sub-bands using QWT. For every sub bands 2 QWT magnitude features such as mean and standard deviation and one phase feature, weighted standard deviation for phase are computed. This results in 27 features per patch. The features thus obtained are given to SVM classifier19 in order to recognize objects. The kernel used here is radial basis kernel (RBF). The performance of QWT is compared with two transforms such as standard wavelet transform (DWT) and 2D dual tree discrete wavelet transform (DTDWT) which is having approximate shift invariance, directional selectivity, limited redundancy, and similar computation efficiency as DWT. Three level decomposition of patch using DWT results in 9 high frequency sub bands. For every patch mean and standard deviation are computed. This results in 18 features per patch. For DTDWT, there are 6 high frequency sub bands instead of three high frequency sub bands at each level in DWT and two low frequency sub bands, which are iteratively decomposed up to a desired level within each branch. For a 3 level decomposition, there are 18 high frequency sub bands. The statistical features such as mean and standard deviation for each sub band results in 36 features per patch. The QWT, DWT and DTDWT features are given separately to SVM classifier for further classification.
In order to compare the performance of our method with others, the number of images used for training and testing are same as in20, i.e, 100/50 images are used for training/testing in Graz 01database and 150/75 images are used for training/testing in Graz 02 database. Both positive and negative images are used for training and testing. The training and testing images are selected randomly. The testing is carried out in non-overlapping manner. The experiments are repeated five times.
The performance of local features is compared with global features. To extract global features, QWT, DWT and DTDWT are applied to the entire image. The level of decomposition considered is 3. For every sub band statistical features are computed. For extracting global features all the images in Graz databases are resized to 256 × 256. Figure 7 shows the ROC curves obtained for three object Categories of Graz 01 Database using QWT, DWT and DTDWT global features. In Graz 01database, 100 positive images and 100 negative images are used for training. For testing new 50 positive images and 50 negative images are considered.
From Fig. 7, it is evident that better ROC curve is obtained for all categories using QWT features. The better ROC curve is the one having largest area under curve (AUC). Equal error rate is the location of the point on a ROC curve where the false accept rate and false reject rate are equal. ROC-equal -error rate gives a nice trade-off value between the true positives and false positives. Fig. 8 shows the ROC curves obtained for three object Categories of Graz 02 Database using QWT, DWT and DTDWT global features.
In Graz 02 database, 150 positive images and 150 negative images are used for training. For testing new 75 positive images and 75 negative images are considered. From Fig. 8, it is evident that better ROC curve is obtained for all object categories using QWT features. ROC equal error rates (EER) calculated for all categories of Graz 01 and Graz 02 databases using local (patch) and global features (image) are shown in Fig. 9.
From Fig. 9, it is evident that local feature (patch) performs well compared to global feature (image). For Graz 01, QWT gives lesser EER compared to DWT and DTDWT for both local and global features and is shown in Fig. 9(a). For the case of Graz 02 database also, QWT gives better performance when compared to DWT and DTDWT and is shown in Fig. 9(b). Table 1 shows the comparison of results for Graz database using proposed method with those reported in literature.
Opelt20, et al. has performed experiments by using more than one type of the various region extractions such as similarity- measure segmentation, Affine interest point detector and difference of Gaussian key point detector with a description methods such as moments invariants, basic moments, scale invariant feature transform (SIFT) and intensity values. Finally object categorization is done by modified Adaboost algorithm.
Table 1.Comparison of experimental results on the Graz database (Equal Error Rates [%]).
The patches of size 16 × 16 are extracted over dense regular grid with spacing of 8 pixels in an image and SIFT features are calculated and classified using SVM21. Oren22, et al. proposed a trivial Naive-Bayes Nearest-Neighbor (NBNN) classifier, which employs Nearest-Neighbor distances in the space of the local image descriptors. Pixel-level object categorization has been done using the integral histograms of oriented gradients (IHOG) descriptor23. The bag of features histograms are computed with the IHOG descriptors and categorized using a support vector machine. A genetic algorithm approach is used to select more effective features in diverse object recognition tasks24.
In our task, equal error rate for Graz database is calculated for five runs and averaged value is presented in Table 1. The lowest equal error rate is shown in bold. In Graz 01 database, our results are better than others, whereas, for car and person category in Graz 02 database, the performance of ours is slightly lesser than others.
A new wavelet based object recognition using the QWT which offers a magnitude and phase analysis is proposed. The proposed method focuses on recognizing various objects by computing features for each and every patch (local feature) that are extracted over the detected wavelet based salient points in complex images and also features derived from entire image (global feature). Finally it is concluded that local feature outperforms global feature. The extracted features based on QWT perform better in challenging Graz databases when compared to another approximate shift invariance transform DTDWT.
Authors express sincere thanks to the Principal and Head of the Department of Electronics and Communication Engineering, Mepco Schlenk Engineering College, Sivakasi, for providing all the facilities and support to carry out this research work. Authors also thank Prof. Raphael Soulard, XLIM-SIC Laboratory, University of Poitiers, France for his support in implementing the algorithm of QWT.
1. Daniel, M.K. Modeling of image variability for recognition. RWTH Aachen University, Germany, 2006, PhD dissertation.
2.Nicu, Sebe & Michael, S. Lew. Comparing salient point detectors. Pat. Recog. Letters, 2003, 24,89–96.
3. Loupias, E.; Sebe, N.; Bres, S. & Jolion, J.M. Wavelet-based salient points for image retrieval. In Proceedings of the International Conference on Image Processing, 2000, 2, pp. 518-521.
4. Schmid, C. & Mohr, R. Local gray value invariants for image retrieval. IEEE Trans. Pat. Anal. Machine Intelligence, 1997, 19(5), 530-534.
5. Weber, M.; Welling, M. & Perona, P. Unsupervised learning of models for recognition. In Proceedings of the Sixth European Conference of Computer Vision, 2000, pp.18–32.
6. Alexandra, Teynor; Esa, Rahtu; Lokesh, Setia & Hans, Burkhar dt. Properties of patch based approaches for the recognition of visual object classes. In Proceedings of DAGM, 2006, 284-293.
7. Thomas, Deselaers; Daniel, Keysers & Hermann, Ney. Improving a Discriminative Approach to Object Recognition using Image Patches. In Pattern Recognition Lecture Notes in Computer Science, 2005, 3663, 326-333.
8. Chan, Wai; Lam, Choi; Hyeokho, Baraniuk & Richard, C. Coherent multiscale image processing using dual-tree quaternion wavelets. IEEE Trans. Image Process., 2008, 17 (7), 1069–1082.
9. Raphaël, Soulard &; Philippe, Carré. Quaternionic wavelets for texture classification. Pat. Recog. Letters, 2011, 32, 1669–1678.
10. Sthyabama, B.; Chitra, P.; GayathriDevi, V.; Raju, S. & Abhaikumar, V. Quaternion wavelets based rotation, scale and translation invariant texture classification and retrieval. J. Sci. Industrial Res., 2011, 70, 256-263.
11. Ming, Yin; Wei, Liu; Jun, Shui & Jiangmin, Wu. Quaternion wavelet analysis and application in image denoising. Math. Problems Eng., 2012. Article ID 493976, 21 p.
12. Selesnick, I.W.; Baraniuk, R.G. & Kingsbury, N.G. The dual-tree complex wavelet transform. IEEE signal Proce. Mag., 2005, 123.
13. Bayro-Corrochano, E. The theory and use of the quaternion wavelet transform. J. Math. Imaging Vision, 2006, 24(1), 19–35.
14. Bülow, T. Hyper complex spectral signal representation for the processing and analysis of images. Institute of Computer Science and Applied Mathematics, Christian-Albrechts-University of Kiel, Germany, 1999, Technical Report-9903.
15. Kingsbury, N.G. Complex wavelets for shift invariant analysis and filtering of signals. J. Appl. Harmonic Anal., 2001, 10, 234-253.
16. Raphaël, Soulard & Philippe, Carré. Quaternionic wavelets for texture classification. In 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), Dallas, United States,2010.
17. http://www.emt.tugraz.at/~pinz/data/Graz-01/(Accessed on 30 Jan. 2012)
18. http://www.emt.tugraz.at/~pinz/data/Graz-02/(Accessed on 4 Feb. 2012)
19. Ma, J.; Zhao, Y. & Ahalt, S. OSU SVM Classifier Matlab Toolbox (ver 3.00), http://eewww.eng.ohiostate.edu/»maj/osu svm/, 2002. (Accessed on 2 Oct. 2010)
20. Opelt, A.; Pinz, A.; Fussenegger, M. & Auer, P. Generic object recognition with boosting. IEEE Trans. Pat. Anal. Machine Intelligence, 2006, 28(3), 416-431.
21. Svetlana, L.; Cordelia, S. & Jean, P. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006,2,2169 – 2178.
22. Oren, B.; Eli, S. & Michal, I.; In defense of nearest-neighbor based image classification. In CVPR 2008, pp.1-8.
23. David, A.; Arnau, R.; Ricardo, T. & Ramon, Lopez de Mantaras. Efficient object pixel-level categorization using bag of features. In Advances in Visual Computing, In Lecture notes in Computer Science, 2009, (5875), 44-54.
24. Masoud, G.; Seyed-Mahdi, Khaligh-Razavi; Reza, E.; Karim, R. & Mohammad, P. How can selection of biologically inspired features improve the performance of a robust object recognition model?. PLoS ONE, February 2012, 7(2), e32357.