Thứ Sáu, 24 tháng 1, 2014
Mã nén Lecture_2 ArithCode
Data Compression
Lecture 2
Arithmetic Code
Alexander Kolesnikov
Arithmetic code
Alphabet extension (blocking symbols) can lead to coding
efficiency
How about treating entire sequence as one symbol!
Not practical with Huffman coding
Arithmetic coding allows you to do precisely this
Basic idea - map data sequences to sub-intervals in [0,1)
with lengths equal to probability of corresponding
sequence.
1) Huffman coder: H ≤ R ≤ H + 1 bit/(symbol, pel)
2) Arithmetic code: H ≤ R ≤ H + 1 bit/message (!)
Arithmetic code: History
Rissanen [1976] : arithmetic code
Pasco [1976] : arithmetic code
Arithmetic code: Algorithm (1)
0) Start by defining the current interval as [0,1).
1) REPEAT for each symbol s in the input stream
a) Divide the current interval [L, H) into subintervals
whose sizes are proportional to the symbols's
probabilities.
b) Select the subinterval [L, H) for the symbol s
and define it as the new current interval
2) When the entire input stream has been processed,
the output should be any number V that uniquely
identify the current interval [L, H).
Arithmetic code: Algorithm (2)
0.70
Arithmetic code:Algorithm (3)
Probabilities: p
1
, p
2
, …, p
N
.
Cumulants: C
1
=0; C
2
=C
1
+p
1
=p
1
;
C
3
=C
2
+p
2
=p
1
+p
2
;
etc
.
C
N
=p
1
+p
2
+…+p
N-1
; C
N+1
=1;
0) Current interval [L, H) = [0.0, 1.0):
1) REPEAT for each symbol s
i
in the input stream:
H ← L + (H − L)*C(s
i+1
),
L ← L + (H − L)*C(s
i
);
2) UNTIL the entire input stream has been processed.
The output code V
is any number that uniquely identify
the current interval [L, H).
Example 1: Statistics
Message: 'SWISS_MISS'
Char Freq Prob [C(s
i
), C(s
i+1
))
S 5 5/10=0.5 [0.5, 1.0)
W 1 1/10=0.1 [0.4, 0.5)
I 2 2/10=0.2 [0.2, 0.4)
M 1 1/10=0.1 [0.1, 0.2)
_ 1 1/10=0.1 [0.0, 0.1)
Example 1: Encoding
S 0.5 [0.5, 1.0)
W 0.1 [0.4, 0.5)
I 0.2 [0.2, 0.4)
M 0.1 [0.1, 0.2)
__ 0.1 [0.0, 0.1)
Example 1: Decoding
S 0.5 [0.5, 1.0)
W 0.1 [0.4, 0.5)
I 0.2 [0.2, 0.4)
M 0.1 [0.1, 0.2)
__ 0.1 [0.0, 0.1)
V ∈ [0.71753375, 0.71753500)
Example 1: Compression?
V ∈ [0.71753375, 0.71753500)
•
How many bits do we need to encode a number V
in the final interval [L, H)?
m=4 bits: 16=2
4
intervals of size ∆=1/16.
•
The number of bits m to represent a value in the interval
of size ∆ : m= -Log
2
(∆) bits.
0
1
0 1
00 01
10 11
000 001 010 011
101 101
110 111
0000 0001
1110 1111
Example 1: Compression (1)
V ∈ [L, H) = [0.71753375, 0.71753500)
•
Interval size (range) r:
r=0.5*0.1*0.2*0.5*0.50.1*0.1*0.2*0.5*0.5=0.00000125
•
The number of bits to represent a value in the
interval [L, H)=[L, L+r) of size r:
m=-log
2
r = -log
2
(0.0000125) = 19.6 =20 bits
∏
=
=
n
i
i
pr
1
Entropyloglog
1
22
=
−=−=
∑
=
n
i
i
prm
Example 1: Compression (2)
•
Entropy = 1.96 bits/char
•
Arithmetic coder:
a) Codeword V: L ≤ V < H
V = (0.71753407…)
10
= (0.10110111101100000101)
2
20 bits
0.71753375 < 0.71753407… < 0.71753500
b) Codelength m:
m=-log
2
(r) = -log
2
(0.0000125) = 19.6 =20 bits
c) Bitrate: R=20 bits/10 chars = 2.0 bits/char
•
Huffman coder: (1+3+2+1+1+4+4+2+1+1)/10=2.2 bits/char
Properties of arithmetic code
In practice, for images, arithmetic coding gives 15-30%
improvement in compression ratios over a simple Huffman
coder. The complexity of arithmetic coding is however
50-300% higher.
Đăng ký:
Đăng Nhận xét (Atom)
Không có nhận xét nào:
Đăng nhận xét