MOTION ESTIMATION IN MPEG-4 VIDEO SEQUENCE USING BLOCK MATCHING ALGORITHM

(1)

MOTION ESTIMATION IN MPEG-4

VIDEO SEQUENCE USING BLOCK

MATCHING ALGORITHM

KISHORE PINNINTI*

Department of Electronics and Communication Engineering, Anil Neerukonda Institute of Technology and Sciences, Sangivalasa, Bheemili (Mandal), Visakhapatnam, Andhra Pradesh, India

kishore_vlsi@yahoo.co.in

P.V. SRIDEVI

Department of Electronics and Communication Engineering,

Andhra University College of Engineering(A),Visakhapatnam ,Andhra Pradesh, India pvs6_5@yahoo.co.in

Abstract:

Now a day, MPEG-4 is the most apparent multimedia standard which combines natural interactivity, synthetic digital videos and computer graphics. It has a wide variety of applications such as video conferencing, computer games and mobile phones etc. All the applications need portable video communicators, so low power VLSI implementations are required. In addition to the required portable devices, a care must be taken for the band width limitations. To perform an effective transmission of video sequences using the limited Bandwidth, the input data must be compressed and coded to fit these limited resources .This paper aims towards the realization of an efficient estimation of moving components in a video image sequences to isolate moving image from the static background. The architecture developed in this paper uses LMS algorithm for estimating the noise components and also uses Block Matching Algorithm for the detection of moving components in the frame sequences. Further Huffman decoder is used for decoding the compressed data in the video codec. The proposed task is implemented in VHDL Language and the results are analyzed in XILINX Spartan-III.

Keywords: MPEG-4, BMA, XILINX Spartan-III.

1. Introduction

Motion estimation (ME) has proven effective in exploiting the temporal redundancy of video sequences and is therefore a central part of the many video compression standards like ISO/IEC, MPEG-1, MPEG-2, MPEG-4 and H.261 etc[1]. All these video compression schemes needs a block coding concept, which was extended within the MPEG-4 standardization to support arbitrarily-shaped video objects. Motion estimation algorithms have attracted much attention in research and industry, because:

 ME is, computationally, the most demanding part of the video encoder (60-80% of the total computation time) and this limits the performance [1].

 The ME algorithm has a high impact on visual performance for a given bit rate [1].

 The algorithm for the extraction of motion vectors (MV) is not standardized, and so is open to competition [1].

(2)

2. Distance Criteria

Block-matching ME algorithms, calculate the motion vector by minimizing a cost function, usually a distance metric. Due to the computational complexity of the search algorithms, the search process is limited to a window of a fixed size. Different matching criteria have been used for the search procedures, however the minimum mean (or sum of) absolute differences (respectively MAD or SAD) is the most popular choice for VLSI implementation. Equations 1 and 2 show the minimum MAD/SAD matching criterion [1-3]:

(2)

Here s1(n1,n2,k) is the pixel value at (n1,n2) in frame k and s2(n1+d1,n2+d2,k+1) is the pixel value at (n1+d1, n2+d2) in frame k+1, d1 and d2 are the horizontal and vertical motion vectors respectively.

3. Block Matching Algorithm

The motion estimation and compensation technique has been widely used in video compression due to its capability of reducing the temporal redundancies between frames. Most of the algorithms developed for motion estimation so far are block-based techniques, called block-matching algorithm (BMA). In this technique, the current frame is divided into fixed size of blocks, and then each block is compared with candidate blocks in reference frame within the search area. The widely used approach for the BMA is the full search BMA (FSBMA), which examines all candidate blocks within the search area in the reference frame to obtain a motion vector (MV). The MV is a displacement between the block in the current frame and the best matched block in reference frame in horizontal and vertical directions. The motion estimation algorithm is performed with a variable size of search area depending on block types varying from 8x8 block to complete frame. The video sequences for low bit-rate video coding applications such as videophone and video-conferencing have some restrictive motion characteristics.

A block in a specific region in the previous frame can belong to the same region at that position in the current frame. A block in background region may lie in the background region in the current frame. The changing block shows the percentage of the difference from background to active region or vice versa. The other labels mean that the block types are the same in successive frames. In all video sequences, the percentage of background blocks in successive frames is very high. The changing blocks occupy only 30% below, meaning that the motion field of each block is very high in successive frames for the other blocks. Also, the pattern of distribution is very similar without regard to video sequences. It is shown that the temporal correlation between successive frame is very high, that is, if a block in the previous frame belongs to background regions or active regions, the block which is located in the same position in the current frame maybe classified as a background block or active moving block, respectively, with a strong probability.

(3)

Fig. 1. Block matching criteria

4. Proposed Architecture

This paper realizes a mpeg-4 motion estimation algorithm for isolating the moving image in moving image sequence. The system reads a moving data and read the frame sequence based on the value passed by the user. The obtained frame sequence is processed using motion estimator for the isolation of moving parameters in the given moving image sequence. Fig. 2. and Fig. 3. shows the realized block diagram for the implementation. This implementation first begins with MATLAB. By using MATLAB software obtain the binary data for a given input moving picture. But the moving data should be in *.avi format. From this we can get matrix with 24X24 dimensions. This binary data can be applied to VHDL module.

Fig. 2. Encoder part of the proposed architecture

Fig. 3. Decoder part of the proposed architecture

4.1. Encoder

Frame reader is the first interfacing unit for the interface of selected moving image to the implemented design. The module reads the complete moving image data and creates multiple frames. These frames are the passed to the next noise estimator module for the calculation of noise present in the frame sequence. Here the size of the each frame is 24X24. Frame reader reads this type of frames and creates 8X8 blocks. In this case 9 blocks of 8X8 size are required for 24X24 frames. So the frame reader generates totally 2 frames and 18 blocks for the given input data. The next immediate block is noise estimator.

Before motion estimation is begun, an accurate estimate of the mean-squared error due to noise in pixel intensities must be obtained. With this information, incorrect non-zero motion vectors may be discarded before they are traced. The rejection of these vectors can have important consequences for the resulting visual quality. The noise estimation is computed in a straightforward fashion where the inter-pixel differences between the

Frame k+1

Frame k

Search window

Time Axis

Video Mpeg-4 data

Mpeg-4 data

Video

(4)

blocks of two successive frames are computed. On every process the obtained inter-pixel difference is considered to be least value and compared with the next difference value.

If the difference is found to be less than the present difference the minimum error is taken as the present difference else the minimum error is retained the previous value. This operation is iterated for the whole block and for the whole image. The obtained minimum error values are the squared and the mean square error is calculated from Equation .3.

LMSE = [mean (least error) 2) + 3* σ (least error)] (3)

This module reads the frame sequence from the frame reader and process on successive frames returning the least MSE (Mean square error) for each frame. The obtained value is then used for the elimination of noise effect in the given frames sequence. Fig. 4. shows the noise estimation module with its input and output interface.

Fig. 4. Noise estimation module

Next block is Motion Estimator; it is used to calculate motion vectors over each pair of frames in the sequence. Since high accuracy in the motion vectors is desired, the estimator performs a full-search over the window. The minimum mean-squared error Criterion gives the best block match. The choice of the block size will have a great impact on results. A smaller block size will tend to produce more false motion vectors despite any noise estimation, but will result in finer edge definition in the resulting segmented image. Larger block sizes have coarser edges but are less plagued by noise effects.

Furthermore the number of computations necessary for motion tracing goes down as the block size increases since there are fewer motion vectors to analyze. For the images analyzed (144 x 176 pixels) a block size of 8x8 pixels is used for the implementation. Fig. 5. shows the motion estimator module for the estimation of motion data in given frame sequence. The module reads the frames sequences with the threshold value obtained from the noise estimator unit and implements the block matching algorithm as explained in section for the estimation of moving and non-moving parameters in give frames.

Fig. 5.Motion estimator module

The next immediate block Frame builder receives raw data from the motion estimator and transforms the raw data into mpeg-4 standard. Frame builder contains four different fields. Synch field, CRC data of 8 bits, FCS of 8 bits and reference bits of 8 bit length. After applying appropriate data to these four fields we get mpeg-4 format.

4.2. Decoder

Decoder block contains frame reader, CRC and tree based decoder. Frame reader can be used to concatenate the data from the frame builder. After that again CRC can be used to check the errors in received data. This data can be decoded using tree based decoder. For each code word, It can detect its corresponding unique word. Frame interpolator reads the isolated motion vectors from motion tracer as well as motion estimator unit and

Noise Estimator

Frame p

Frame p+1

Least mean square Error

Motion Estimator

(Block Matching Algorithm)

Frame-p

Frame p+1

Threshold

(5)

rebuilds the frames depending on the motion vectors obtained. This module place the pixels back to the same position as in the original frames for the pixels isolated. Remaining pixel positions are been filled with zeros to neglect the non-moving elements.

5. Results

5.1. Output of an AVI video file

By using image processing tool box in MATLAB tool the given input video file is converted into image format with frame size 24X24 and block size is 8X8. The result of the *.avi file which consists 10 frames is shown in Fig. 6.Later this image is converted to individual frames for further analysis as shown in Fig. 7.Then each pel value of the individual images is extracted in the form of binary. This binary information is to be used in Block matching algorithm.

Fig. 6. Frame sequence of given input Video file

Fig. 7. Conversion Of Continuous Frames Into Individual Frames

5.2. Video Codec Output

Video codec architecture is implemented using VHDL and it is synthesized using XILINX software. Fig. [8-11] shown below are the final simulation results of complete Video codec, Routing, logic block implantation and floor plan view of the design respectively.

(6)

Fig. 9. Routed design of MPEG-4 motion estimator

Fig. 10. The Logical block implementation in CLB taken from design on targeting FPGA

(7)

Conclusions

The MPEG-4 motion estimator has been implemented on XILINX Spartan-3 and the target device is xc3s1600e-5fg484.The design used 125 slices,181 BELS,49 registers,200 LUTS (4 input look up tables)and the reported maximum FPGA clock rate is 191.571MHz. A system has been proposed here which can efficiently segment a moving foreground object from a given image sequence with still background .The sub module of the system are developed using VHDL and verified for its functionality. From the result obtained it can be observed that the block matching algorithm can efficiently eliminate the problem of redundant data in moving images.

References

[1] Shahrukh Agha and Vincent M Dwyer, “Algorithms and VLSI Architectures for MPEG-4 Motion Estimation” Electronic Systems and Control Division Research, 2003.

[2] D. Gong, Y. He, and Z. Cao, “New cost-effective VLSI implementation of a 2D discrete cosine transform and its inverse,” IEEE Trans. Circuits Syst. Video technol., vol.14.no.4,pp.405-415,April.2004

[3] S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, "Fast and robust multi-frame super-resolution," IEEE Trans.Image Process., vol. 13, no. 10, pp. 1327-1344, Oct. 2004.

[4] J. Wright and R. Pless, "Analysis of persistent motion patterns using the 3d structure tensor," presented at the IEEE Workshop Motion and Video Computing, 2005.

[5] G. Farnebäck, "Polynomial Expansion for Orientation and Motion Es-timation," Ph.D. dissertation, Linköping Univ., Linköping, Sweden,2002.

[6] S. Zhu and K. Ma, "A new diamond search algorithm for fast block-matching motion estimation," IEEE Trans. Image Process., vol. 9, no. 2, pp. 287-290, Feb.2000.

[7] N. A. Woods, N. P. Galatsanos, and A. K. Katsaggelos, "Stochastic methods for joint registration, restoration, and interpolation of multiple undersampled images," IEEE Trans. Image Process., vol. 15, no. 1, pp. 201-213, Jan. 2006.

[8] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau, "Multiframe resolution enhancement methods for compressed video," IEEE Signal Process. Lett., vol. 9, no. 2, pp. 170-174, Jun. 2002.

[9] M. M. J. Koo and N. Bose, "Constrained total least squares computations for high resolution image reconstruction with multisensors," Int. J. Imag. Syst. Technol., vol. 12, pp. 35-42, 2002.