Efficient Motion Field Interpolation Method for Wyner-Ziv Video Coding I Made

Wyner-Ziv video coding has the capability to reduce video encoding complexity by shifting motion estimation procedure from encoder to decoder. Amongst many motion estimation methods, expectation maximization algorithm is the most effective one. Unfortunately, the implementation of block-based motion estimation in this algorithm causes motion field profile bounded by granularity of block size. Nearestneighbor and bilinear interpolation methods have already applied in multiview image coding to handle similar problem. This paper aims to evaluate performance of both interpolation methods in transformdomain Wyner-Ziv video codec. Results showed that bilinear interpolation effective only for high motion video sequences. In this scenario, it has bitrate saving up to 3.29 %, 0.2 dB higher PSNR, and 12.30 % higher decoding complexity compared to nearest-neighbor. In low motion video content, bitrate saving only gained up to 0.82%, with almost the same PSNR, while decoding complexity increase up to 10.32%.


Introduction
Wyner-Ziv video coding (WZVC) is a new paradigm in video coding [1], based on Slepian-Wolf [2] and Wyner-Ziv [3] information theories in 1970s.Those theories proved that separate encoding and joint decoding (distributed compression) can achieve similar performances to joint encoding and joint decoding as long as correlated side information (SI) is used in decoder side.Model of this approach is shown in Figure 1, where SI is received by exploit correlation between frames that already been decoded.SI quality strongly influences the compression efficiency in WZVC system.In general, better SI will lead to better rate-distortion (RD) performance.But, as long as SI is generated in decoder with minimum information about source, accurate SI estimation becomes a difficult task.So, to enhance WZVC performance, the principal investigators of the DISCOVER project identify " finding the best SI at the decoder" as a key task [4].
Recently, many practical methods to generate SI have been proposed, for example, motion-compensated Interpolation (MCI) and motion-compensated extrapolation (MCE) that adopted in classical pixel domain WZVC system [1].Key idea of this approach is prediction of current frame by motion estimation that uses decoded frame.However, decoded frame carries only limited information.Moreover, since blind motion estimation does not use any information of current frame to be encoded, the SI is generally not accurate.Discover codec follows this WZVC architecture and improves SI generation to increase system performance [5]- [7].In this codec, estimate of motion field within MCI is done bidirectionally (backward and forward motion estimation) followed by spatial motion smoothing technique to smoothen blocking effect.
More recently, SI generation method by motion vector learning along decoding process using expectation maximization (EM) algorithm for WZVC system has been proposed by [8].The same method had been applied for learning disparity information in distributed coding of random dot stereograms [9] and distributed grayscale stereo image coding [10], [11].Learning based method using EM algorithm does disparity compensation and motion field compensation based on block, where disparity and motion field profile bounded by granularity of k-by-k blocks.To improve disparity estimate, a disparity block-to-pixel Interpolation method using bilinear interpolation had been proposed by [12].This approach gains significant bit saving compared to nearest-neighbor (NN) interpolation previously applied.However, bilinear interpolation increases decoding complexity, since it uses more spatial blocks for weighting factors.
This paper aims to evaluate implementation of disparity probability interpolation [12] in WZVC codec [8] by interpolate motion field distribution from block to pixel.The differences of motion field characteristic between sequence of video frames allow design of WZVC codec to implement different interpolation methods.The main contribution of this paper is to find efficient motion field distribution interpolation method from block to pixel in WZVC codec [8], for video sequences of different motion characteristics, in the sense of saving bitrate, RD performance, and decoding complexity.
This paper is organized as follows.In section 2, unsupervised forward motion vector learning based on EM algorithm [8] for WZVC codec is reviewed.Next on section 3, EM algorithm for unsupervised forward motion vector learning with motion field distribution interpolation method from block to pixel is extended.Evaluation of this implementation method in transform-domain WZVC [8] will be explained in section 4, and finally conclusions are presented in section 5.

EM-Based Unsupervised Forward Motion Vector Learning for WZVC
The main objective of EM algorithm is to estimate parameter without any prior information or complete observation data.EM algorithm ensures that estimate coefficient converges to local optimized value using maximum likelihood function.EM algorithm frame work to find forward motion vector in WZVC codec is shown in Figure 2. Below are details of EM algorithm that used in [8].

Model
In Figure 2, X is current Wyner-Ziv luminance frame and Ŷ is previously decoded luminance frame, where X is related to Ŷ through a forward motion field M. The residual of X with respect to motion-compensated Ŷ is treated as independent Laplacian noise Z. So, the decoder's a posteriori probabilty distribution of source X based on parameter θ, can be modeled as below: The E-step updates the motion field distribution with reference to the source model parameters, while the M-step updates the source model parameters with reference to the motion field distribution.Note that P{M|Y,S;θ} is the probability of observing motion M given that it relates X (as parameterized by θ) to Ŷ, and also given syndrome S.

E-step Algorithm
The E-step updates the estimated distribution on M and before renormalization is written as To reduce operational cost from possibly high value of M, motion field estimation of M is conducted by block-by-block motion vectors M u,v .For a specified blocksize k, every k-by-k block of θ (t−1) is compared to the colocated block of Ŷ as well as all those in a fixed motion search range around it (± m pixels horizontally and vertically, respectively).For a block θ u,v (t-1) with top left pixel located at (u,v), the distribution on the shift M u,v is updated as below and normalized: } is the probability of observing Ŷ (u,v)+Mu,v given that it was generated through vector M u,v from X u,v as parameterized by θ u,v (t-1) .This procedure, shown in the left of Figure 3, occurs in the block-based motion estimator.

M-step
The M-step updates the soft estimate θ by maximizing the likelihood of Ŷ and syndrome S.
ISSN: 1693-6930 TELKOMNIKA Vol. 9, No. 1, April 2011 : 191 -200   194 where, the summation is over all configurations m of the motion field.Since true maximization is intractable, an approach is conducted by generating soft SI ψ (t) , followed by an iteration of joint bitplane low-density parity-check (LDPC) decoding to yield θ (t) .The process can be distinguished as follows: u,v is created by blending estimates from each the blocks Ŷ (u,v)+Mu,v according to P app (t) {M u,v } as shown in the probability model in the right hand side of Figure 3.More generally, the probability that the blended SI has value ! at pixel (i, j) is ) where p Z (z) is the probability mass function of the independent additive noise Z, and Ŷ m is the previous reconstructed frame compensated through motion configuration m.  (t) ).generated by model probability then used by LDPC decoder to generate soft estimate (θ (t) ).To this process, LDPC decoder implement joint bit plane LDPC decoding method to maximize soft SI ψ (t) (i,j,!) and syndrome (S).For procedure in detail of joint bitplane algorithm decoding, please refere to [8].

Termination
Iterating between the E-step and the M-step in this way learns the forward motion vectors at the granularity of k-by-k blocks.The decoding algorithm terminates successfully when the hard estimates yield a syndrome equal to S

Research Method
As described in section 2, E-step dan M-step iteration are carried out based on k-by-k blocks.These methods generate a coarse motion field profile from unnatural step-like transition effect in block boundaries.This causes low quality soft SI ψ (t) and some false beliefs are propagated in LDPC decoder.This process might make hard estimate generate syndrome that not equal to S. To increase soft SI ψ (t) , we have to improve motion field profile (M) in order to minimize effect of unnatural step-like transitions in block boundaries.This improvement carried out by computing motion field distribution in pixel based P app {M(i,j)} to estimate motion field (M) boundaries.This approach is equivalent with interpolate motion field distribution from block to pixel.
As described in Equation ( 6) of section 2, pixel-based motion field distribution P app {M(i,j)} then used as weighting factor for probability mass function of independent additive noise p Z (z) in soft SI ψ (t) generation process.This soft SI ψ (t) accuracy is mainly depend on distribution P app {M(i,j)}.Improvement process will be done in the every EM iteration.For this aproach, a motion field interpolation process is added after block-based motion estimator in WZVC.Framework of EM algorithm with motion field interpolation in WZVC is shown in Figure 4.
In paper [12] on Wyner-Ziv coding of stereo image with unsupervised learning of disparity, disparity improvement is carried out by applying bilinear interpolation and the results are compared to NN interpolation applied in [9,10,11].This approach gave significant bit saving, however there was no explanation about decoding complexity as the consequences of that approach.
In distributed video coding application, different characteristic of motion field (M) between sequence of video frames allow WZVC codec to implement different interpolation methods.This paper adopts NN and bilinear interpolation [12] to improve motion field (M) estimation and to compare their performances in WZVC codec [8] for different video sequence input.More details of implementation of interpolation method in [12] are discussed in sections below.

Nearest-Neighbor (NN) Interpolation
In motion field interpolation from block to pixel, NN interpolation method only computes single blockwise motion field distribution P app {M u,v } for every pixel within k-by-k block.So all pixels in this block shares the same blockwise motion field distribution P app {M u,v }.If M u,v (u,v) is denotes block-by-block motion field and M(i,j) denotes pixel-by-pixel motion field, pixel-by-pixel motion field distribution motion field defines as where k is the block size.
When there is a high correlation between blocks in frame X with blocks in previously decoded frame Ŷ, motion field estimation generated by block-based motion estimation will be more accurate.This condition allows single blockwise motion field distribution P app {M u,v } to be implemented for every pixel in particular block, because each pixel in the same block has the same information.

Bilinear Interpolation
In bilinear interpolation, motion field interpolation from block to pixel computes 4 (four) blockwise motion field distribution for every pixel in k-by-k block.As stated in Equation ( 8), each distribution has weighting factor a 1 to a 4 , where those weighting factors are chosen such as the closest block spatially contributes more to the weighted sum.Also, a 1 through a 4 are properly normalized by the area of the k-by-k block so that bilinear interpolation creates a convex combination of the disparity probabilities.In DCT procedures, we use block size of 8 x 8 pixels for the block-based motion estimatior, the motion field interpolation, and the probability model.To estimate block-based motion field in motion estimator and probability model, we use motion search range ± 5 pixels horisontally and vertically.Rate control is impelemented by using rate-adaptive regular degree 3 LDPC accumulate codes of length 50688 bits as a platform for the joint bitplane systems.In these experiments, the EM algorithm at the decoder is initialized with a good value for variance of Laplacian noise Z and experimentally-chosen distributions for motion vectors M u,v :

Analysis of bitrate savings
Table 1 shows a comparison of average bitrate that needed to transmit WZ frame when WZVC decoder uses NN and bilinear motion field interpolations.In general, bilinear interpolation yields saving rate in all quantization factors.However, when decoding complexity is considered, only Foreman and Carphone video sequences give significant saving bitrate, i.e. up to 3.29% and 2.73% respectively, at Q 0.5 quantization scaling factor.For decoding complexities, both video sequences increase up to 12.30% and 9.74%, respectively.This results show that bilinear interpolation has the capability to improve initial motion field for high motion content video sequence.Four selected weighting coefficients are able to make spatially closer blocks to source pixel contribute more to the weighted sum in smoothen motion field profile.
On the contrary, for low motion video sequence, i.e.Container and Hall, saving bitrate achieved by bilinear interpolation is not worth compared to it's complexity.Both video sequences gain only 0.82% and 0.92%, respectively, while decoding complexities increase up to 10.32% and 15.35%, respectively at Q 0.5 .So, when decoding complexity becomes major concern, NN interpolation is more suitable to improve motion field estimation in WZVC codec for low motion video sequence.

Analysis of Rate-Distortion Performance
This analysis aims to find average peak to peak signal to noise ratio (PSNR) quality from implementation of both interpolation methods for fixed bitrate.Figure 7 shows that implementation of bilinear interpolation in transform-domain WZVC codec has average gain of 0.2 dB higher than NN interpolation technique for Foreman and Carphone video sequences, while there is no significant different for Container and Hall video sequences.This implies that NN interpolation is more efficient in transform-domain WZVC codec for low motion video sequence.

Decoding complexity
To estimate decoding complexity, total decoding time for all sequences is measured in seconds.Table 1 shows total time needed to encode all 96 frames using JPEG quantization matrix at scale of Q 0.5 , Q 1 , Q 2 and Q 3 .In general, experiment results show that implementation of bilinear interpolation makes decoding process of transform-domain WZVC codec becomes more complex.Regarding to decoding complexity, bilinear interpolation is more efficient for video sequence with high motion content, such as Foreman and Carphone.Meanwhile, for low motion content video sequence like Container and Hall, NN interpolation is more efficient to implement.

Conclusions
This paper has evaluated implementation of motion field interpolation method in transform-domain WZVC codec that use block-based motion field compensation method at each EM algorithm iteration.Motion field quality improvement was carried out by motion field block to pixel interpolation using NN and bilinear interpolation methods.Performances of implementation of both interpolations in transform-domain WZVC codec were analyzed based on saving bitrate, RD performance, and decoding complexity.Experiment results showed that bilinear interpolation gave more efficient saving bitrate, RD performance, and decoding complexity for video sequence with high motion content, while NN interpolation achieved almost equal saving bitrate and RD performance to bilinear interpolation for low motion content, but far lower decoding complexity.
For further work, we will implement super-sampling interpolation to increase transformdomain WZVC codec performances, mainly in processing video sequence with high motion content.

Figure 4 .
Figure 4. Motion field interpolation in EM-based unsupervised forward motion vector learning for WZVC

Table 1 .
Comparison of transform-domain WZVC codec in different interpolation methods Efficient Motion Field Interpolation Method for Wyner-Ziv Video …. (I Made Oka Widyantara) 199