Content-based Image Retrieval by Information Theoretic Measure

Content-based image retrieval focuses on intuitive and efficient methods for retrieving images from databases based on the content of the images. A new entropy function that serves as a measure of information content in an image termed as ‘an information theoretic measure’ is devised in this paper. Among the various query paradigms, query by example (QBE) is adopted to set a query image for retrieval from a large image database. In this paper, colour and texture features are extracted using the new entropy function and the dominant colour is considered as a visual feature for a particular set of images. Thus colour and texture features constitute the two-dimensional feature vector for indexing the images. The low dimensionality of the feature vector speeds up the atomic query. Indices in a large database system help retrieve the images relevant to the query image without looking at every image in the database. The entropy values of colour and texture and the dominant colour are considered for measuring the similarity. The utility of the proposed image retrieval system based on the information theoretic measures is demonstrated on a benchmark dataset.

Keywords:    Image retrieval,   fuzzy features,   descriptors,   entropy,   indexing

The use of entropy in an image retrieval system is not as popular as compared to other methods which utilize wavelets or colour, texture, and shape descriptors. But the potentiality of entropy as a descriptor cannot be ignored and in the recent past, many researchers have started exploring the possibility of using entropy in different domains. Entropy has been found effectively useful in image indexing4,16,11,18 and in similarity measures12,16,20.

The term entropy as a scientific concept was first used in thermodynamics by Clausius17. Its probabilistic interpretation in the context of statistical mechanics is attributed to Boltzmann19. Shannon1 has used this concept to describe the properties of long sequences of symbols, and applied the results to a number of basic problems in coding theory and data transmission. Later the definition of entropy was extended to the field of information theory. The entropy of a system as defined by Shannon1 gives a measure of uncertainty about the actual structure of the image. Shannon’s definition based on the information gain from an event is inversely proportional to its probability of occurrence.

The entropy of an image is used for different applications in image processing. The interpretation of entropy in an image depends on how an event is defined and also the definition of its posterior probability. In general, gray level is considered as an event and colour histogram as its probability density function. In addition to this, it is assumed that gray levels are statistically independent. Pun5 and Kapur6, et al. have used the Shannon’s concept to define the entropy of an image assuming that an image is entirely represented by its gray level histogram only. Leung7, et al. have attempted to isolate an object from the background by using the Gray-scale Image Entropy. The entropy (Shannon’s) of the histogram may be taken as a measure of information content in an image; such entropy is also called global information measure of the image. A low value of entropy indicates the skewness of the distribution of gray values, while a high value may be taken as an indicator of nearly uniform distribution of gray values. Definitely the histogram and the global entropy are not dependent on the spatial distribution of gray values in the image. The co-occurrence matrix2 captures the spatial details of an image to some extent. The entropy of the co-occurrence matrix gives another measure of image information known as local entropy or second order entropy. Likewise the conditional entropy of a partitioned image can also be defined.

In texture analysis, an important approach to region description is to quantify its texture content. The randomness is the basic property of texture. By exploiting this fact, entropy as a statistical descriptor of texture could measure the variability in the image.

Pal and Pal8 define a new entropy function based on the exponential behavior of information gain and applied co-occurrence based entropy methods for image thresholding. Zachary16 attempts to use entropy as a visual feature of an image and showed how effectively entropy can be employed for indexing and also as a similarity measure of images in an image retrieval system. In IKONA11, a region-based image retrieval system, human faces are identified and preprocessed using the entropy map which assigns a saliency for each pixel in the face. This saliency is expressed as the entropy of a local gray level distribution in a region around each pixel.

A new entropy function is presented in this paper. This entropy function is so designed that it can be efficiently used for image analysis. The properties and proofs of the proposed entropy function are presented. This new function is compared with the well known entropy functions. Two entropy based features with one feature representing the colour information and the other representing the texture information are derived from an image. Apart from this, another visual descriptor of the image called the dominant colour is described. These entropy based features are further used in the multi dimensional indexing technique. An interim result set is created using the indices of images for improving the performance of the retrieval system. The results of image retrieval will be compared with those of the classical model proposed by Swain and Ballard15.

Most of the entropy functions are not suitable for representing the information in a fuzzy set. These include Shannon1, Renyi9 and Pal10 et al. entropy functions. These entropy functions are generalised by introducing a polynomial in the exponential gain function. The proposed entropy function is shown to satisfy the basic properties of entropy and then Pal and Pal’s entropy function is proved to be the special case of this function.

The function involved in the entropy need not be a membership function; it could be any feature. It may be noted that when we use a membership function, the unknown parameters of this membership function will parameterize the entropy function indirectly. However, the choice of a suitable membership function is not easy. Hence, the main motivation behind development of the entropy function is our concern to represent the information/uncertainty contained in a fuzzy set.

Here, authors are mainly concerned with a single fuzzy set. However in a fuzzy rule usually many fuzzy sets are encountered, but this case will be addressed in the future work. The definition of Pal and Pal10 entropy functionis now extended considering the exponential behavior of the gain function. This will pave the way to devise a new entropy function for representing the information in a fuzzy set.

2.1 Definition and Properties

The information gain corresponding to the occurrence of the ith event is defined as

I( p i )= e (a p i 3 +b p i 2 +c p i +d) (1)

for all probabilities pi∊[0,1], H Sn 1 ( P n )= i=1 n p i log p i and a, b, c and d are the real-valued parameters.

The entropy is expressed by

H=E(I( p i ))= i=1 n p i I( p i ) (2)

Note that the above entropy function is an expectation of the information gain function. Some of the important properties associated with this function are stated now.

Property 1:

I( p i )= e (a p i 3 +b p i 2 +c p i +d) is a continuous function for all pi∊[0,1]

Property 2:

I(pi) is bounded

Property 3:

With the increase in pi, I(pi) decreases.

Property 4:

H(P)= p i *I( p i ) is a continuous function for all pi∊[0,1] real valued a, b, c and d parameters.

Property 5:

If p1 = p2 = ... = pn = 1/n then H(P) is an increasing function of n.

Property 6:

H(P)= i=1 n p i * e (a p i 3 +b p i 2 +c p i +d)

where, 0 ≤ pi ≤ 1 and p i =1 is a concave function.

Property 7:

Consider a partition A = [A1, A2, … An] with the probabilities, pi = Pr[Ai] and assume that p1<p2. If p1 is increased and p2 is decreased by an equal amount, say δ (δ>0 and δ ≤ (p2p1)/2), then the entropy increases.

Property 8:

Entropy is maximum when all pi’s are equal. In other words, H(P) ≤ H(1/n, 1/n, n,...,1/n)

Property 9:

Entropy is minimum if and only if all pi’s except one are zeros and that single pi is equal to 1.

Property 10:

Consider the partition of the event space as A = [A1, A2, … An] and the probability pi = Pr(Ai). If a new partition B is formed by subdividing one of the sets of A, then H(B) ≥ H(A).

The proofs of some of the above properties are consigned to Appendix ‘A’.

2.1.1. Normalised Entropy

The normalised entropy HN can be defined as

H N =(H e (a+b+c+d) )/λ (3)

where, The constant λ= e ( a n 3 + b n 2 + c n +d) e (a+b+c+d) ; a, b, c and d are real valued parameters; and n is the number of events in the probabilistic experiment (or the number of states in the system).

The normalised entropy satisfies all the properties of an entropy function.

2.1.2. Conditional Entropy

Consider two partitions A = [A1, A2, … An] and B = [B1, B2, … Bm] and let us define that the product of two partitions, A = [Ai] and B = [Bj] is a partition whose elements are all intersections AiBj and the product of partitions is denoted by A.B = [AiBj]. Let pij be the probability of the event AiBj, i.e., pij = Pr [AiBj] and the marginal probabilities pi = Pr[Ai] is defined as,

p i = j=1 m p ij

Similarly the marginal probability, qj = Pr[Bj] can be enumerated as,

q j = i=1 n p ij

The conditional entropy of Ai given that Bj has occurred is denoted by

Pr[ A i | B j ]=Pr[ A i B j ]/Pr[ B j ]= p ij / q j = p i|j (4)


Pr[ B j | A i ]=Pr[ A i B j ]/Pr[ A i ]= p ij / p i = q j|i (5)

Therefore the entropy of a partition A, given that Bj has occurred is given by H[A|Bj] as:

H[A| B j ]= i=1 n p i|j e [a p i|j 3 +b p i|j 2 +c p i|j +d]

Thus the conditional entropy of A assuming B is therefore:

H[A|B]= j=1 m q j H[A| B j ]= j=1 m i=1 n q j p i|j e [a p i|j 3 +b p i|j 2 +c p i|j +d] (6)


H[B|A]= i=1 n j=1 m p i q j|i e [a q j|i 3 +b q j|i 2 +c q j|i +d] (7)

Now the entropy of the product of the partitions, A.B is easily found to be

H[A.B]= i=1 n j=1 m p ij e [a p ij 3 +b p ij 2 +c p ij +d] (8)

2.2 Comparison with other Entropy Functions

In the last few decades treatment of uncertainty is one of the concerns in the research circles. Shannon’s entropy1 is the pioneering work on the information measure. This entropy function is defined in the domain of probability with n-states and in that the information gain is inversely related to its probability of occurrence.

H Sn 1 ( P n )= i=1 n p i log p i (9)

Renyi9 has extended the definition of Shannon’s entropy to an incomplete probability distribution. The Renyi’s entropy of order α is of the form

H Rn 1 ( P n ,α)= 1 1α log [ i=1 n p i α i=1 n p i ] , α>0,α1 (10)

It is noted that as α tends to 1, Renyi’s entropy matches with Shannon’s entropy. In contrast to Shannon’s entropy, Pal and Pal’s entropy function considers the information gain as an exponential function: e(1–pi)

H PPn 1 ( P n )= i=1 n p i e (1 p i ) (11)

The normalised Pal and Pal’s Entropy is given by

H Nor 1 ( P n )= H PPn 1 ( P n )1 ( e 1 1 n 1) (12)

Following the above entropy function, an exponential entropy function is devised with an eye on incorporating information into it. A significant feature of this entropy function is that it captures the inherent uncertainty in the fuzzy sets.

The uncertainty in a fuzzy set is non-statistical in nature and it plays an important role in fuzzy image processing. Moreover, the use of a polynomial in the exponential function yields the gain function bestowed with four tunable parameters. The parameters of the new entropy function provide controls on the information gain and proper tuning of these parameters by way of optimisation leads to the correct measure of uncertainty.

Recall the new entropy function defined above as

H= i=1 n p i e (a p i 3 +b p i 2 +c p i +d) (13)

and its corresponding normalised entropy is as follows

H N =(H e (a+b+c+d) )/( e ( a n 3 + b n 2 + c n +d) e (a+b+c+d) ) (14)

This entropy function behaves like Pal and Pal’s entropy function for a = 0, b = 0, c = 1 and d = -1. The curves in Figs 1 and 2 show Shannon’s entropy, Normalised Pal and Pal’s entropy and Normalised new entropy for a two-state system for the two sets of parameter values. It may be observed that though the new entropy function behaves like the Shannon’s and Pal and Pal’s entropy functions yet it expands and contracts for varying values of the parameters.

2.3 Possible Applications

  • Biometrics: In finger print identification, we can model the minutiae points by finding their distances with respect to a reference point which may be selected as the one with the highest curvature. In the palm print, we can model the points on the primary lines (there are mainly three such lines) and use this information for authentication.
  • Medical Image processing: Breast cancer and brain tumor detection can be easily attempted by the entropy based modeling. In both these problems, issues in the affected organs undergo textural changes on the infringement of malignancy. The tumour regions can be extracted by applying the entropy function in a shifting window. The tissues in the affected portions naturally show entropy values different from those of the benign tissues for a chosen set of coefficients of the entropy function.
  • Cryptography: The power of the proposed entropy can be best utilized in the encryption of secrete keys for secure transmission. In the above we have cited a few problems but one can harvest many fields should the ingenuity and imagination permit. However, in this paper its application to image retrieval will be discussed by way of indexing.

Many visual features have been explored in the literature on content-based image retrieval (CBIR) for the purpose of depicting colour, texture, shape and other properties of an image. Combining the various features usually achieves better performance in retrieval. Although a large number of features could represent the image very accurately, the inherent problem with this approach is the shortage of storage capacity for large image database. Hence an efficient multidimensional indexing technique is required for dimension reduction.

The objective of this section is to reduce the dimensionality of the feature space and to improve the performance of the retrieval process. For this, the entropy-based features are extracted from the images. The entropy of an image, by definition, is the measure of information content in the image8. As will be seen, the entropy function maps an n-dimensional vector to a single real number (i.e. one dimensional space) and so it can be regarded as a dimension reduction operator.

The extracted features from an image are stored to serve as an index of that image. Since the expensive management of storage and comparison time is less significant than the retrieval accuracy, only three concise and precise features are used to describe the contents of the image. These are colour entropy, texture entropy and dominant colour of the image.

3.1 Entropy-based Image Features

The entropy of a system as defined by Shannon1 gives a measure of uncertainty about its actual structure. Shannon’s definition based on the information gain from an event is inversely proportional to its probability of occurrence. Pun5 and Kapur6, et al. have used Shannon’s concept to define entropy of an image assuming that the image is entirely represented by its gray level histogram only. Unlike the logarithmic behavior of Shannon’s entropy, the gain function in our entropy definition is of exponential nature as discussed in Section II. Two entropy based image features – colour entropy and texture entropy are now presented.

3.1.1 Colour Entropy

The RGB colour space is chosen to represent the image. Let F = {f(x, y)}M × N be an image of size M × N where f(x, y) is the colour vector (r, g, b) in the RGB space at (x, y) point and N(r, g, b) is the frequency of the colour vector (r, g, b).

Then, r g b N (r,g,b) =M×N

In a natural image, it has been observed that out of 2563 different colour levels, a small fraction of different colour levels are actually used. So instead of considering all colour levels, the colour levels of an image are quantized adaptively. To achieve this, the available colour levels are clustered into colour bins.

The number of bins is not preset; it depends on the distribution of colour in that particular image. The above algorithm adaptively clusters all the colour levels into bins. Pun5 and Kapur6, et al. use the gray-level histogram to represent the image; here the histogram is extended to the colour bins. It is generally assumed that the distribution of colours across an image follows the uniform distribution, i.e. each colour has a 1/(M × N) probability where the image size is M × N. The algorithm for forming bins is given Algorithm 1.

Let,Z = {z1, z2, … zn} be the set of events that corresponds to the colour bins. Let us consider the probability of the colour bin zi as p(zi) = (Number of pixels in the colour bin Zi)/ M × N. In this context, the colour entropy is taken in the normalised form, HN, Colour given by

H N,Color =( H Color e (a+b+c+d) )/λ H Color = i=1 n p( z i ) e [{ a * p 3 ( z i )+ b * p 2 ( z i )+ c * p( z i )+d}] (15)


where, λ=[ e ( a n 3 + b n 2 + c n +d) e (a+b+c+d) ] and n is the number of colour bins.

3.1.2 Entropy Optimisation

The entropy optimisation is resorted to estimate the four tunable parameters: a,b,c and d. The derivatives of HN with respect to a,b,c and d are obtained from:

H N a = Λ 1 + [ i=1 n p 4 ( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} ]+ H 1 ] H 2 H 1 H N b = Λ 2 + [ i=1 n p 3 ( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} + H 1 ] H 2 H 1 H N c = Λ 3 + [ i=1 n p 2 ( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} + H 1 ] H 2 H 1 H N d = Λ 4 + [ i=1 n p( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} + H 1 ] H 2 H 1 (16)


Λ 1 = [ i=1 n p( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} H 1 ] ( H 2 H 1 ) 2 [ H 2 n 3 H 1 ] Λ 2 = [ i=1 n p( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} H 1 ] ( H 2 H 1 ) 2 ( H 2 n 2 H 1 ) Λ 3 = [ i=1 n p( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} H 1 ] ( H 2 H 1 ) 2 ( H 2 n H 1 ) Λ 4 = [ i=1 n p( z i )* e {a p 3 ( z i )+b p 2 ( z i )+cp( z i )+d} H 1 ] ( H 2 H 1 ) 2 ( H 2 1 H 1 )

and H 1 = e (a+b+c+d) , H 2 = e ( a n 3 + b n 2 + c n +d)

Then the parameters are updated using gradient descent learning as

a New = a Old ε 4 (i) H N a b New = b Old ε 4 (i) H N b c New = c Old ε 4 (i) H N c d New = d Old ε 4 (i) H N d (17)

where the learning rates ε(i). As this learning has convergence problems we propose reinforcement learning.

3.1.3 Reinforcement Learning

The reinforcement learning requires some re-use policy of how to use the past information.

Exploitation: In this we exploit the Reuse policy which requires integrating knowledge of the past policy into the current learning process. Here we need to bias the exploratory process of the new policy with the past one. We have used here the sigmoid function for ε in which cumulative of the past errors is biased by the term k2 and the slope or gain of the function is changed by the term k1.

Exploration: In the earlier work k1 and k2 were incremented by constant, but they are updated here by random numbers thus boosting exploration. The proposed11 reinforcement learning gets stuck up in the local minima. This has been modified to incorporate evolutionary feature27. In this each parameter is updated by a number of updating laws called a population each having a pair of random numbers for k1 and k2. The law yielding the minimum is considered as the global solution. Because of this it overcomes the drawback of local minima by taking into account both exploitation and exploration strategies.

If n is the population size, then we make use of the Reinforced learning law given by Hanmandlu and Murthy27

ε l (i)= e ( k 1l (i) err(j,i)+ k 2l (i)) (18)

where err(j,i)= H N (j,i) H N new (j1,i)=Δ H N (j,i) ; l=1,1,..,4, i =1 to n and j is the current iteration.

The initial values of k1l (i) and k2l (i) are taken both as random numbers. We use the following policy to adjust the values of k1l (i) and k2l (i). This consists of the following steps:

If err(j,i) is increasing, then ε(i) must decrease. So k1l(i) should increase.

  1. If err(j,i) is decreasing, then ε(i) need not change. So k2l (i) should increase.

  2. If err(j,i) is constant, then ε(i) should not change. So k1l (i) and k2l (i) are not changed.

If HN(j,i) is not differentiable then parameter of interest, say a, is updated as follows:

a(i+1)=a(i)ε(i) H N ( a old (i)+p) H N ( a old (i)) p =x(i)+ε(i) Δf p (19)

where, p = 0.0001.

The evolution is bestowed by the following procedure: We create a population of solution processes (particles) for each parameter. Let this population be denoted as n (taken to be 10). The function values of all particles HN(j,i) are arranged in the ascending order. After that half of the populations with high values of function are eliminated as they are unfit to propagate to new generation. Thereafter reproduction is carried out by the rest half of the population. This procedure continued until we reach the global minimum of the function.

3.1.4 Colour Entropy Results

In this sub-section, colour entropy values are computed for different natural images. The parameters a, b, c, d are initialized to one and then the entropy optimisation technique is applied to find the normalised entropy value termed as colour entropy. Table 1 shows the colour entropy values and the corresponding optimised parameter values.

3.2 Texture Entropy

The ambiguities in texture arising due to the fuzzy nature of image function allow us to devise fuzzy texture features. So rather than using the colour histogram to represent an image, the fuzzy features that capture the fuzziness in the texture property are extracted about the neighborhood of each pixel. Since texture is region based, arrangement of image functions (i.e., intensities) of pixels in a local region, say, a window is made use of in order to characterize the texture using the Gaussian type membership function. The cumulative response about the central pixel from this window replaces the pixel intensity giving rise to the texture image when taken over the entire image.

3.2.1 Extraction of Fuzzy Features

To convert the spatial domain image into the fuzzy domain, the spatial arrangement of gray levels of pixels over a window is utilised. The fuzzy property can be expressed in terms of a membership function. A membership function to this effect is represented by the Gaussian type function.

μ (k,l) (i,j)=exp[ {(x(i,j)x(k,l))/τ} 2 ] (20)

where, x (i, j) is the gray level of the pixel at the (i,j)th position and τ is the fuzzifier which is specified to be the window size (τ is taken as 5)

We note that

μ (i,j) (k,l)=1 if x(i,j)=x(k,l) (21)

To consider the response from the neighboring pixels, the cumulative response of (i, j)th pixel is obtained as

y(i,j)= k,l μ (k,l) (i,j)*x(k,l)/ k,l μ (k,l) (i,j) (22)

This is the defuzzified response of the (i,j)th pixel over the window of size 5. This process is repeated for all pixels in the image resulting in a texture feature image consisting of all the defuzzified values. For convenience of notation, the matrix formed by y(i,j) is designated as the response matrix, Y which in turn would represent the texture feature image.

Let Z = {z1, z2, … zn} be the set of distinct responses of Y. Let us consider the probability of the texture response zi as p(zi) = (Number of pixels having the texture response zi)/ M × N. On the heels of the normalised colour entropy, here comes the normalised texture entropy HN,Texture as

H N,Texture =( H Texture e (a+b+c+d) )/λ With  H Texture = i=1 n p( z i )* e {a* p 3 ( z i )+ b * p 2 ( z i )+ c * p( z i )+d} (23)

where, λ=[ e ( a n 3 + b n 2 + c n +d) e (a+b+c+d) ] and n is the number of distinct responses.

In the entropy function (22, 23), a, b, c and d are four tunable parameters with 0≤HN,Texture ≤1. Here the texture entropy is optimised to estimate these parameters using the optimisation technique described earlier.

In order to compute the texture entropy values for different natural images, the values of the parameters - a, b, c, d are initially set at 1 and then the normalised entropy value is calculated using the entropy optimisation technique. The texture entropy values and the corresponding optimised parameter values are shown in Table 2.

3.2.2 Visual Image Feature

It may be noted that the images might have the similar statistical features yet they are visually different. The choice of an appropriate visual feature plays an important role in classifying the visually similar images. There are several visual features – colour, texture, shape, contrast, coarseness, normalised area of an object etc. proposed by earlier researchers13, 14. In the present work, we make use of another visual feature – dominant colour of an image.

Dominant colour of an image is the particular colour level that has the highest frequency in the image. It is a perceptual property of an image. As the CIE Lab colour space is perceptually uniform, the dominant colour is transformed to the lab colour space consisting of luminance component L and chrominance components C = {a, b}. Table 3 shows the RGB values and the CIE Lab values of the dominant colour and the percentage area covered by this colour.

The images of sea and sunset category, shown in Table 3, have distinct dominant colour, which is necessarily an important visual feature. The images of the same category will have nearly the same dominant colour, therefore in the retrieval system, CIE Lab values of the dominant colour could be used to classify similar images.

It is possible that the images of different categories have the same dominant colour, so the dominant colour is appropriate to be one of the visual features for classification.

3.2.3 Indexing through Feature Vector

As the image collection is getting larger and larger, the retrieval speed is becoming a bottleneck. Hence effective high dimensional indexing techniques need to be explored. The high dimensionality of feature vectors is normally of the order of 102. Applying the dimension reduction on the feature vectors, an embedded dimension much lower than the initial dimension though still higher for linear ordering is obtained. Therefore a multi-dimensional indexing scheme is required to index the reduced embedded dimensional feature vectors. From this discussion it follows that the multi-dimensional indexing owes its allegiance to the high dimensionality of the feature space.

As there are only three features, viz., colour entropy, texture entropy and dominant colour used in this work, the dimension of feature space is too small to apply any dimension reduction algorithm. An entropy-based two dimensional indexing is employed here using both colour and texture entropy values. In our method, the dimension of the feature vectors is not dynamic; i.e., it remains fixed as there is no need for any sophisticated tree-data structure for multidimensional indexing.

A feature vector of an image indexed by a pair of entropy values of colour and texture becomes a point in the two-dimensional feature plane whose axes are colour entropy and texture entropy.

The concept of interim result is invoked here. The interim result set is a subset of image search space containing images semantically similar to the query image. The above indexing scheme would improve the retrieval performance by creating an interim result set. For a particular query image, the colour entropy and texture entropy, say (qColourEnt, qTextureEnt) can be easily calculated. Now here is the algorithm for creating an interim result set (Algorithm 2).

In the above procedure, the choice of ε is very crucial, because the size of the interim result set is directly proportional to the value of ε. It is important to note that the value of ε should depend on the size of the feature space and also on the entropy values of colour and texture of the query image.

Let ξ color 2 and ξ Texture 2 represent the second order moments of entropy values of colour entropy and texture of the query image respectively. The expressions for the statistical descriptors are as follows:

ξ Color = 1 n i=1 n (cEn t i qColorEnt) 2 (24)

ξ Color = 1 n i=1 n (tEn t i qTextureEnt) 2 (25)

where, n is the size of feature space, cEnti and tEnti denote the colour entropy and texture entropy of ith image respectively.

An expression for ε is coined by

ε= k*Max( ξ Color , ξ Texture) (26)

where, k is any positive real number.

The interim result set contains all the feature points having a distance of less than or equal to ε from the query image. One of the purposes of creating the interim result set is to reduce the search space without sacrificing the correctness of results of the retrieval process.

3.2.4 Experimental Results

A set of 9907 images from low resolution web-crawled miscellaneous database26 used in WBIIS25 is our benchmark dataset for generating the query results. Next the interim result set for different query images is created. The recall for the interim result set denotes the percentage of correct retrieved images out of all correct images in the image database. The ratio of the size of the interim result set to the original size of the image database gives the reduction factor. Therefore a large recall value with high reduction factor indicates the efficiency of the algorithm.

It may be observed from Table 4 that the reduction factor increases by decreasing the value of ε. For lower value of ε naturally recall also decreases. The higher recall value indicates the correctness of the system; hence the reduction factor is increased so as to keep the recall value higher. Here are the results for other query images on the same image database.

Any search engine, text-based or otherwise, is plagued by the problem of un-related matches. Often in the case of text based search engines, this hitch arises from the use of ambiguous keywords, such as bank, interest. Content-based image retrieval system allows a user to set query by an example image21, which ideally removes the ambiguity in setting up of queries. But the success of the image retrieval depends on the extraction of image features and measurement of similarity of query image with the images in a large data collection. In continuation of the feature extraction and the effective indexing methodology in Section 3, the distance measures between any two images in the interim result set are the necessary ingredients for comparison. Furthermore, the performance of the retrieval process will be compared with the traditional colour histogram matching15.

4.1 Image Distance Measure

To represent an image the required are the entropy based feature descriptors: colour entropy and texture entropy and a visual colour feature – dominant colour. These feature descriptors allow us to define a distance metric that closely matches the human perception. The idea is that the similarity between the two images should be measured in terms of not only the closeness of colour and texture distributions but also the closeness of the dominant colour of the image.

Consider two images, a query image IQ and a target image IT to measure the distance. The dominant colours of both the images are denoted by CQ(Lq, aq, bq) and CT(Lt, at, bt) and the other two entropy based features – colour entropy and texture entropy, for query image and target image by ColourEntQ, ColourEntT and TexEntQ, TexEntT respectively. Then the distance between the images is computed as a linear combination of two L2 distances. The first L2 distance is measured between the two dominant colours; the lower value of this distance indicates the visual closeness in terms of colour ignoring the information in the image, which is given by the entropy. So, the distance between the dominant colours is:

D 1 = ( L q L t ) 2 + ( a q a t ) 2 + ( b q b t ) 2 (27)

The second L2 distance is measured between the entropy based feature vectors (colour entropy, texture entropy). The entropy is a real number which indicates the amount of information contained in an image in terms of colour distribution and textural pattern. The lower value of second distance indicates the similarity of images in terms of content.

D 2 = (ColorEn t T ColorEn t Q ) 2 + (TexEn t T TexEn t Q ) 2 (28)

The distance between two images IQ and IT is defined as

Dist( I Q , I T )=α* D 1 +(1α)*100* D 2 (29)

where, 0 ≤ α ≤ 1

The value of D2 lies between 0 to 1 and it is also very low in comparison to D1. To scale both the distances at the same level, a weight of 100 is assigned to D2. The value of α in the distance measure formula can be interpreted as the relative importance between the dominant colour and the information content. The range of α varies from 0 to 1. The higher value of α assigns more importance to the dominant colour than the information content in the retrieval of similar images. In sunset image, dominant colour has more importance, so the user could set the value of α more than 0.5 (which stands for equal importance) in forming the query for the image retrieval. It is important to note that the background colour or dominant colour might be irrelevant in many images but this would mislead the retrieval process. The value of α should be set to 0 for the queries where the dominant colour does not have any importance.

Table 8 shows that the distances between the similar images are comparatively less than the distances between the different categorical images. In this table, the distances between the query image and sample images are measured for α = 0.5.

4.2 Image Retrieval by Querying

A search engine whether text-based or otherwise is prone to unwanted matches. There are several reasons to get incorrect matches; one of them is due to the use of ambiguous keywords like interest, bank, etc., and another due to the use of inappropriate words to describe the desired images. In the proposed content-based image retrieval (CBIR) system, the user can set the query by an example from the query image set. In our system, the natural and real image databases are categorized into different classes of images, namely, animal, aircraft, architecture, landscape, sea shores, vehicles, flowers, human activities, etc. A small set of query images is presented in Table 9.

4.3 Colour Histogram Matching

The axes used for the histograms are the three opponent colour axes, assigned as follows:

r g =rg (30)

b y =2×brg (31)

w b =r+g+b (32)

Here r, g, and b represent red, green, and blue signals, respectively. The rg, by, and wb axes are analogous to the opponent colour axes used by the human visual system22. They are used here simply to allow the intensity (wb) axis to be more coarsely sampled than the other two, because the intensity axis is more sensitive to the variation in lighting from shadows and distance from the light source. The wb axis is divided into 8 sections while the rg and by axes are each divided into 16 sections, for a total of 2048 bins. As the total intensity limits the colour differences possible, only a fraction of them can actually receive counts. To clarify this point, suppose that the camera outputs a maximum number of intensity levels, M on each channel. Letting wb = 0(r = g = b = 0) or wb = 3M (r = g = b =M), both axes by and wb turn out to be 0. Thus the maximum intensity level restricts the available axes.

Several measures have been proposed for the dissimilarity between two histograms, H = {hi} and K = {ki}. The bin-by-bin dissimilarity is determined by comparing the contents of the corresponding histogram bins, i.e. hi and ki for all i but not hiand kjfor ij. The dissimilarity between the two histograms is determined as a sum of all pair-wise comparisons and it implies a binary ground distance with a threshold depending on the bin size.

Minkowski-Form Distance: This distance is commonly defined as:

d Lr (H,K)= ( i | h i k i | r ) 1/r (33)

The L1 distance is often used for computing dissimilarity between colour images15.

Kullback-Leibler Divergence and Jeffrey Divergence: The Kullback-Leibler divergence appears as:

d KL (H,K)= i h i log h i k i (34)

From the information theory point of view, the K-L divergence measures how inefficient it could be to code one histogram using the other as the code-book23. However, the K-L divergence is non-symmetric and is sensitive to histogram binning. The empirically derived Jeffrey divergence is a modification of the K-L divergence that is numerically stable and robust with respect to noise and the size of histogram bins24. The Jeffrey divergence is determined from:

d J (H,K)= i ( h i log h i m i + k i log k i m i ) (35)

χ2 statistics:

d χ 2 (H,K)= i ( h i k i ) 2 m i (36)

where, m i = h i + k i 2

Given a query image with histogram H, each database image with histogram K receives a measure of dissimilarity for the query image. It is then easy to rank the database images based on their dissimilarity measures and return the best matches.

The experimental configuration for a process as subjective as computing similarity between the images must be carefully set up to gauge the results with other methods and we need to remove any perceptual biases of an experimenter. To minimize the human subjectivity, the random samples of different sizes are used and also a large sample space is taken as the image database. The large image database rules out the possibility of having dominancy of any particular category of images.

Our retrieval system is tested using 9907 images from the low resolution database26 as mentioned above. This image collection contains different category of images like cars, roses, mountains, patterns, animal, landscape, seaside, flowers, human activities, etc. To provide numerical results, 8 sample images are taken randomly selected from four categories prior to the manual determination of the correctness of retried images. Each category contains at least 100 relevant images. A retrieved image is considered a match if it belongs to the same category of the query image.

The query results obtained by our entropy-based matching are compared with those obtained from colour histogram matching using different dissimilarity measures – χ2 statistics, Jeffrey divergence and L1 norm. Our content based image retrieval system is developed in Visual C++ 6.0 as an offline system on Intel Celeron machine with 1.40GHz processor and 256 MB RAM. The average retrieval time is 1 second per 1000 images. The process of image retrieval involves the following steps:

  1. Create the interim result set using the proposed indexing scheme.
  2. Extract feature values (colour entropy, texture entropy and dominant colour) for the query image.
  3. Calculate the image distance of the query image with all images in the interim result set.
  4. Rank the images according to the distance measured from the query image.
  5. Set the highest rank of the retrieved image which has the minimum distance with respect to the query image.

The results of retrieval by our entropy-based system are compared with those from colour histogram matching15. Testing is carried out on a set of query images from the benchmark image26 database of size 9907.

It may be observed from Fig. 4 that the performance of the proposed system on the car query (car Id: 1639) outperforms Jeffrey divergence, χ2 statistics and L1 norm, but the results from Fig. 5 show that entropy based matching is comparable with others. The entropy-based method retrieves the correct images using only 3 features – colour entropy, texture entropy and dominant colour thus enhancing the processing speed of the system with a reduced number of feature values.

The performance of other query images of the categories: Rose, seaside and pattern are now discussed.

Figures 10 and 11 both show the performance of the image retrieval for query images from the pattern category. The value of α is set to 0 to rule out the dominant colour information. The entropy based method retrieves the images purely based on entropy values and it is found to perform reasonably better than other histogram matching methods for the pattern category thus demonstrating its ability to retrieve semantically similar images.

It may be noted from the results of experiments that the proposed system performs better than the histogram matching for the query images, which possess distinct features in terms of texture or colour. In the rose, seaside category images, dominant colour and colour distribution are the prominent features and in these cases the proposed system performs reasonably well, but our proposed CBIR system has stooped to the subdued performance in the face of histogram matching for some cases. If the pattern images have the distinct texture features, then our system comes out with significantly better performance as shown in Figs 10 and 11.

A new entropy function that is aimed at representing the information in a fuzzy set is presented along with some important properties. The proposed entropy function has four tunable parameters that can be estimated by optimising the entropy function itself. For obtaining the global solution reinforcement learning along with population based approaches is used.

Two types of entropy based image features – colour entropy and texture entropy are utilized in this work. The colour entropy is described in terms of randomness in the distribution of colours in an image. For this, the newly devised entropy function comes handy and resorting to the optimisation of this entropy function yields the optimised colour entropy value. The resulting entropy values uniquely classify the non textured image. But for the textured image first the fuzzy texture features of the image are extracted and then the new entropy function is applied to the features for optimising the entropy values of texture.

Both the entropy based image features are derived in such a manner that they rely only on the colour and texture distributions of an image. In the course of research, the need for visual colour information is realized, consequently the dominant colour value is considered as the third image feature. Despite the efficiency of the RGB colour space, the pixels are transformed from the RGB colour space to the CIELAB colour space because the latter one has the property of perceptual uniformity. Falling in line, the CIELAB value of the dominant colour is also used as the visual feature.

Indices permit the computer in finding the images relevant to a query without looking at every image in the database. The indexing of the colour feature vectors is investigated to speed up the atomic queries. Dimensionality is one of the major concerns in the indexing scheme. The index speed degrades as the dimensionality of the data indexed increases. In the experiments carried out, images are indexed by the entropy values of colour and texture. The uniqueness of the entropy values of an image and their lower dimensionality rule out the possibility of using any sophisticated and computationally taxing multidimensional indexing data structures. The normalised entropy values of colour and texture are found to be more suitable as a pair to index an image. This indexing scheme is found to be effective in creating an interim result set for a query image. The algorithm for the interim result set and the results obtained ensure that the size of the database is reduced considerably without affecting the correctness of the results. The objective of an image retrieval system is to identify images from the database, which are similar to the query image. Similarity or distance measurement between the two images is derived from the empirical estimates of the image features. The three image features – colour entropy, texture entropy and dominant colour in Lab colour space are selected to formulate the similarity measure. Two distances: L2 distance between the dominant colours of two images and L2 distance of between the two colour entropy values and also between the texture entropy values are explored. These two L2 distances are combined in linear form to give the measure of entropy. Table 8 demonstrates that the distance between semantically similar images are significantly low in comparison to the distance between the two dissimilar images. This measure is used to rank the retrieved images by their distances from the query image.

In the proposed retrieval system, the query paradigm is employed for setting up the query as ‘Query by Example’ (QBE). In that WBIIS test image dataset of 9907 colour miscellaneous images is organized into several categories with at least 100 images per category. The features of the images are extracted off-line and the similarity measures along with their ranking are computed at the time of executing the query. The query results so obtained are compared with those from the colour histogram matching using χ2 Statistics, Jeffrey Divergence and L1 norm. The results of our system are very promising in the retrieval of the similar images from the database. Exploring the entropy function for fuzzy modeling of the image retrieval system is the next phase of this work.

1. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J., 1948, 27, 379-423.

2. Jain, K. Fundamental of digital image processing. Prentice Hall of India, 2000.

3. Manjunath, S. & Ma, W.Y. Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intelligence, 1996, 18(8), 837-42.

4. Koskela, M.; Laaksonen, J. & Oja, E. Entropy-based measures for clustering and SOM topology to content-based image indexing and retrieval. In the Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04), 2004, edited by J. Kittler, M. Petrou, and M.S. Nixon, IEEE CS Press, US. 2004, 2, pp. 1005-1008.

5. Pun, T. A new method for gray-level picture thresholding using the entropy of the histogram. Signal Processing, 1980, 2, 223-37.

6. Kapur, J.N.; Sahoo, P.K. & Wong, A.K.C. A new method for gray-level picture thresholding using the entropy of the histogram. Compu. Graph. Vision Image Proc., 1985, 29, 273-85.

7. Leung, K.; Lam, F.K. & To, J.T.P. Entropy-based multiscale image segmentation with edge refinement. In the Proceedings of 1st International Conference on Information Technology and Application (ICITA 2002), 2002, Bathurst, NSW Australia, 24-28 November 2002.

8. Pal, Nikhil R. & Pal, Sankar K. Entropy: A new definition and its applications. Syst. Man Cyber., 1991, 21(5), 1260-270.

9. Renyi, A. On measure of entropy and information. In the 4th Berkeley Symposium of Mathematics Statistics and Probability, Berkeley, California, 1961, 1, pp. 547-61.

10. Pal, N.R. & Pal, S.K. Some Properties of the Exponential Entropy. Information Sciences, 1992, 66, 113-17.

11. Boujemaa, N.; Fauqueur, J.; Ferecatu, M.; Fleuret, F.; Gouet, V.; LeSaux, B. & Sahbi, H. IKONA: interactive specific and generic image retrieval. In International Workshop on Multimedia Content-based Indexing and Retrieval (MMCBIR’2001), France, 2001.

12. Moshfeghi, M.; Saiz, C. & Yu, H. Content-based Retrieval of medical images with relative entropy. In Proceedings of SPIE, 2004, 5371, pp. 259-267.

13. Byoung, K.; Jing, P. & Hyeran, B. Region-based image retrieval using probabilistic feature relevance learning. Pattern Anal. Appl., 2001, 4(2-3), 174-84.

14. Battiato, S.; Gallo, G. & Nicotra, S. Perceptive visual texture classification and retrieval. In the 12th International Conference on Image Analysis and Processing, 2003, Mantova, Italy, 17-19 September 2003, pp. 524-29.

15. Swain, M. & Ballard, D. Colour indexing. Inter. J. Comp. Vision, 1991, 7(1), 11-32.

16. Zachary, J. M. Jr. Louisiana State University, 2000. PhD Dissertation.

17. Yagi, E. Analytical approach to Clausius’s first memoir on mechanical theory of heat (1850). Historia Sci. 1981, 20, 77-94.

18. Wang, J. Z.; Wiederhold, G.; Firschein, O. & Wei, S.X. Content- based image indexing and searching using daubechies’ wavelets. Int. J. Digital Libr., 1997, 1(4), 311-28.

19. Boltzmann, Ludwig (1896, 1898). Lectures on gas theory. Translated by Stephen G. Brush, Berkeley: University of California Press, New York, 1995.

20. Jain, R.; Murthy, S.N.J.; Chen, P.L.J. and Chatterjee, S. Similarity measures for image databases. In Proceedings of SPIE, 1995, 2420, pp. 58-65.

21. Faloutsos, C.; Barber, R.; Flickner, M.; Hafner, J.; Niblack, W.; Petkovic, D. & Equitz, W. Efficient and effective querying by image content. J. Intell. Inform. Syst., 1994, 3(3-4), 231-62.

22. Lennie, E, & D’Zmura, M. Mechanisms of colour vision. CRC Crit. Rev. Neurobiol. 1988, 3, 333-400

23. Cover, T.M. & Thomas, J.A. Elements of information theory. Wiley Series in Telecommunications. John Wiley & Sons, New York, NY, USA. 1991.

24. Puzicha, J.; Hofmann, T. & Buhmann, J. Non-parametric similarity measures for unsupervised texture segmentation and image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1997. San Juan, Puerto Rico, pp. 267-72.

25. Wang, WBIIS J.Z.; Wiederhold, G.; Firschein, O. & Sha, X.W. Content-Based Image Indexing and Searching Using Daubechies’ Wavelets. Int. J. Digital Libraries, 1998, 1(4), 311-28.

26. Download databases for research comparison (low resolution web-crawled misc database used in WBIIS). (Accessed on: September 2010)

27. Hanmandlu, M. & Murthy, O.V.R. Reinforcement learning in the entropy based recognition of handwritten hindi numerals. In the 10th World Multi-conference on Systemics, Cybernetics and Informatics, 16-19 July 2006, Orlando, Florida, USA.

Dr Madasu Hanmandlu received his MTech (Power systems) from REC Warangal, Jawaharlal Nehru Technological University (JNTU), India, in 1976, and PhD (Control Systems) from Indian Institute of Technology (IIT), Delhi, India, in 1981. Presently working as Professor in Department of Electrical Engineering, IIT, Delhi. His current research interests mainly include: Fuzzy modeling for dynamic systems and applications of fuzzy logic to image processing, document processing, medical imaging, multimodal biometrics, surveillance and intelligent control. He has authored a book on Computer Graphics and published more than 220 publications in both conferences and journals. He has guided 15 PhDs and 100 MTech students.

Dr Anirban Das received MCA from Jawaharlal Nehru University, New Delhi in 1999, MTech (Computer Sci. Engg) from IIT Delhi in 2002 and PhD (Computer Sci.) from Jamia Millia Islamia University, New Delhi in 2008. He is currently working with R Systems International Ltd. as Technical Architect for biometric system development. His research interests include: Image processing, computer vision, pattern recognition, biometric authentication, and artificial intelligence.


Proofs of some important properties of the new entropy function

We will now provide the proofs of some important properties presented in Section II-A.

Proof of Property 2

I( p i ) e d as p i 0andI( p i ) e (a+b+c+d) as p i 1.

We consider k1= ed and k2 = ed(a+b+c+d). Since a, b, c and d are real so k1 and k2 are finite. Hence I (pi) is bounded.

Proof of Property 3

We have k1= ed and k2 = e–(a+b+c+d) as defined in the Property 2. The ratio,

k 1 / k 2 = e d / e (a+b+c+d) = e (a+b+c) >1for(a+b+c)>0 (37)

To prove that I(p) is a decreasing function, we need to show that the derivative of I(p) with respect to p is always negative or zero.

The derivative, d dp I(p)= d dp e (a p 3 +b p 2 +cp+d) = e (a p 3 +b p 2 +cp+d) (3a p 2 +2bp+c) (38)

Definitely for any p, 0≤p≤1, e (a p 3 +b p 2 +cp+d) >0 . Hence,

d dp I(p)= d dp e (a p 3 +b p 2 +cp+d) = e (a p 3 +b p 2 +cp+d) (3a p 2 +2bp+c) (39)

So we can conclude that for a ≥ 0, b ≥ 0 and c ≥ 0, I(p) always decreases for 0≤p≤1.

Proof of Property 5

Consider the case where p1 = p2 = …=pn = 1/n and n≥1. Then

H(P)= i=1 n p i e (a p i 3 +b p i 2 +c p i +d) = i=1 n (1/n) e ( a n 3 + b b 2 + c n +d) = (1/n)h(n) (40)

where, h(n)= e ( a n 3 + b n 2 + c n +d)

To prove that H(P) is an increasing function, it is sufficient to prove that h(n) is an increasing function.

d dn h(n)= d dn e ( a n 3 + b n 2 + c n +d) [ 3a n 4 + 2b n 3 = c n 2 ] e ( a n 3 + b n 2 + c n +d) =(1/ n 4 )(3a+2bn+c n 2 ) e ( a n 3 + b n 2 + c n +d) (41)

For n≥1 and a0,b0,c0 d dn h(n)0 .

Therefore H(P) is an increasing function for a,b,c≥0.

Proof of Property 7

For the sake of simplicity, consider p1<p2. Let us now construct a new partition B = [B1, B2, A3,.. An] with Pr [B1] = p1 + δ, Pr[B2] = p2 – δ and δ>0 and δ<=(p2p1)/2.

Now,H(B)H(A)=H( p 1 +δ, p 2 -δ, p 3 ,pn)- H(p 1 , p 2 , p 3 ,pn) =( p 1 +δ) e [a ( p 1 +δ) 3 +b ( p 1 +δ) 2 +c( p 1 +δ)+d] p 1 e [a p 1 3 +b p 1 2 +c p 1 +d] +( p 2 δ) e [a ( p 2 δ) 3 +b ( p 2 δ) 2 +c( p 2 δ)+d] p 2 e [a p 2 3 +b p 2 2 +c p 2 +d] (42)

Since ϕ(p)=p e [a p 3 +b p 2 +cp+d] is convex, the condition ϕ( p 1 )+ϕ( p 2 )<ϕ( p 1 +δ)ϕ( p 2 δ) is easily satisfied if p1<p1+ δ<p2-δ<p2. Hence, H(B) – H(A) > 0, i.e. H(B) > H(A), thus completing the proof.

Proof of Property 8

The proposed entropy function for the probability distribution P=[p1, p2,…pn] is defined as follows:

H(P)= i=1 n p i e (a p i 3 +b p i 2 +c p i +d) .


H p i = e (a p i 3 +b p i 2 +c p i +d) p i e (a p i 3 +b p i 2 +c p i +d) (3a p i 2 +2b p i +c) = e (a p i 3 +b p i 2 +c p i +d) [13a p i 3 2b p i 2 c p i ) (43)

2 H p i 2 = e (a p i 3 +b p i 2 +c p i +d) (9a p i 2 4b p i c) e (a p i 3 +b p i 2 +c p i +d) (13a p i 3 2b p i 2 c p i )(3a p i 2 +2b p i +c) = e (a p i 3 +b p i 2 +c p i +d) {9 a 2 p i 5 12ab p i 4 (4 b 2 +6ac) p i 3 +(12a4bc) p i 2 +(6b c 2 ) p i +2c} (44)

2 H p i 2 | p i =1/n =(1/ n 5 ) e ( a n 3 + b n 2 + c n +d) {2c n 5 +(6b c 2 ) n 4 +(12a4bc) n 3 (4 b 2 +6ac) n 2 12abn9 a 2 } = (45)


β=(1/ n 5 ) e ( a n 3 + b n 2 + c n +d) {2c n 5 +(6b- c 2 ) n 4 +(12a4bc) n 3 (4 b 2 +6ac) n 2 12abn9 a 2 }


2 H p i p j =0 forijandi,j=1,2,..n

The Hessian matrix is of the form

H=( β 0 0 0 0 β 0 0 0 0 0 β ) (46)

For the point P0 = (1/n, 1/n, … 1/n) to attain the maximum, H | P o should be negative definite. Moreover, H is negative definite if the determinant value of kth principal minor of H has the sign of (-1)k, k = 1, 2,… n.

Therefore, determinant of kth principal minor of H is

( β 0 0 0 0 β 0 0 0 0 0 β )={ (1)kβk βkifkiseven βkifkiseven (47)

Here, a need arises to show that β>0, which implies that

(1/ n 5 ) e ( a n 3 + b n 2 + c n +d) {2c n 2 +(6b c 2 ) n 4 +(12a4bc) n 3 (4 b 2 +6ac) n 2 12abn9 a 2 }>0

This yields

n 3 > (4 b 2 +6ac) n 2 +12abn+9 a 2 2c n 2 +(6b c 2 )n+(12a4bc)

The above is always true if a ≥ 0, b ≥ 0, c ≥ 0 and a, b, c must satisfy (14).

So, H(P) ≤ H(1/n,1/n,...1/n).

Proof of Property 9

Suppose that pi=0 for all i except pk where pk =1. Next, we prove that the entropy H is the minimum by contradiction. Consider that there are at least two non-zero probabilities say pi and pj for the minimum value of H. Now using the Property 7, we can write that

H( p 1 , p 2 ,... p i +δ,..., p j δ, p n )>H( p 1 , p 2 , p 3 ,... p n )

where, δ >0 and δ ≤| pipj | /2

In our case, H(0, 0, …,δ, …, 1-δ, … 0) > H(0, 0,…,1,, 0). This contradicts the fact that H is the minimum. Hence H is minimum only when all pi’s except one are zeros.

Proof of Property 10

The partition A = [A1, A2, ….. An] is changed to B = [Ba Bb A2, ….. An] where A1 is subdivided into Ba and Bb and pa = Pr(Ba), pb = Pr(Bb) and p1 = pa + pb

Now H(A)= i=1 n p i e (a p i 3 +b p i 2 +c p i +d) and let us consider ϕ(p)=p e (a p 3 +b p 2 +cp+d) .

We can write

H(A)ϕ( p a + p b )=H(B)ϕ( p a )ϕ( p b ) H(A)-H(B)=ϕ( p a + p b )ϕ( p a )ϕ( p b ) H(A)-H(B)=( p a + p b ) e [a ( p a + p b ) 3 +b ( p a + p b ) 2 +c( p a + p b )+d] p a e [a p a 3 +b p a 2 +c p a +d] p b e [a p b 3 +b p b 2 +c p b +d] H(A)-H(B)= p a e [a p a 3 +b p a 2 +c p a +d] [ e [3a p a 2 p b +3a p a p b 2 +a p b 3 +2b p a p b +b p b 2 +c p b ] 1]+ p b e [a p b 3 +b p b 2 +c p b +d] [ e [3a p a 2 p b +3a p a p b 2 +a p a 3 +2b p a p b +b p a 2 +c p a ] 1] H(A)-H(B)0

This completes the proof.