Thứ Sáu, 24 tháng 1, 2014

Mã nén Lecture_2 ArithCode


Data Compression
Lecture 2
Arithmetic Code
Alexander Kolesnikov

Arithmetic code
Alphabet extension (blocking symbols) can lead to coding
efficiency
How about treating entire sequence as one symbol!
Not practical with Huffman coding
Arithmetic coding allows you to do precisely this
Basic idea - map data sequences to sub-intervals in [0,1)
with lengths equal to probability of corresponding
sequence.
1) Huffman coder: H ≤ R ≤ H + 1 bit/(symbol, pel)
2) Arithmetic code: H ≤ R ≤ H + 1 bit/message (!)

Arithmetic code: History
Rissanen [1976] : arithmetic code
Pasco [1976] : arithmetic code

Arithmetic code: Algorithm (1)
0) Start by defining the current interval as [0,1).
1) REPEAT for each symbol s in the input stream
a) Divide the current interval [L, H) into subintervals
whose sizes are proportional to the symbols's
probabilities.
b) Select the subinterval [L, H) for the symbol s
and define it as the new current interval
2) When the entire input stream has been processed,
the output should be any number V that uniquely
identify the current interval [L, H).

Arithmetic code: Algorithm (2)
0.70

Arithmetic code:Algorithm (3)
Probabilities: p
1
, p
2
, …, p
N
.
Cumulants: C
1
=0; C
2
=C
1
+p
1
=p
1
;

C
3
=C
2
+p
2
=p
1
+p
2
;
etc
.
C
N
=p
1
+p
2
+…+p
N-1
; C
N+1
=1;
0) Current interval [L, H) = [0.0, 1.0):
1) REPEAT for each symbol s
i
in the input stream:
H ← L + (H − L)*C(s
i+1
),
L ← L + (H − L)*C(s
i
);
2) UNTIL the entire input stream has been processed.
The output code V

is any number that uniquely identify
the current interval [L, H).

Example 1: Statistics
Message: 'SWISS_MISS'
Char Freq Prob [C(s
i
), C(s
i+1
))
S 5 5/10=0.5 [0.5, 1.0)
W 1 1/10=0.1 [0.4, 0.5)
I 2 2/10=0.2 [0.2, 0.4)
M 1 1/10=0.1 [0.1, 0.2)
_ 1 1/10=0.1 [0.0, 0.1)

Example 1: Encoding
S 0.5 [0.5, 1.0)
W 0.1 [0.4, 0.5)
I 0.2 [0.2, 0.4)
M 0.1 [0.1, 0.2)
__ 0.1 [0.0, 0.1)

Example 1: Decoding
S 0.5 [0.5, 1.0)
W 0.1 [0.4, 0.5)
I 0.2 [0.2, 0.4)
M 0.1 [0.1, 0.2)
__ 0.1 [0.0, 0.1)
V ∈ [0.71753375, 0.71753500)

Example 1: Compression?
V ∈ [0.71753375, 0.71753500)

How many bits do we need to encode a number V
in the final interval [L, H)?
m=4 bits: 16=2
4
intervals of size ∆=1/16.

The number of bits m to represent a value in the interval
of size ∆ : m= -Log
2
(∆) bits.
0
1
0 1
00 01
10 11
000 001 010 011
101 101
110 111
0000 0001
1110 1111

Example 1: Compression (1)
V ∈ [L, H) = [0.71753375, 0.71753500)

Interval size (range) r:
r=0.5*0.1*0.2*0.5*0.50.1*0.1*0.2*0.5*0.5=0.00000125

The number of bits to represent a value in the
interval [L, H)=[L, L+r) of size r:
m=-log
2
r = -log
2
(0.0000125) = 19.6 =20 bits


=
=
n
i
i
pr
1
   
Entropyloglog
1
22
=






−=−=

=
n
i
i
prm

Example 1: Compression (2)

Entropy = 1.96 bits/char

Arithmetic coder:
a) Codeword V: L ≤ V < H
V = (0.71753407…)
10
= (0.10110111101100000101)
2


20 bits
0.71753375 < 0.71753407… < 0.71753500
b) Codelength m:
m=-log
2
(r) = -log
2
(0.0000125) = 19.6 =20 bits
c) Bitrate: R=20 bits/10 chars = 2.0 bits/char

Huffman coder: (1+3+2+1+1+4+4+2+1+1)/10=2.2 bits/char

Properties of arithmetic code
In practice, for images, arithmetic coding gives 15-30%
improvement in compression ratios over a simple Huffman
coder. The complexity of arithmetic coding is however
50-300% higher.

Không có nhận xét nào:

Đăng nhận xét