1
00:00:00,000 --> 00:00:00,910

2
00:00:00,910 --> 00:00:03,790
PROFESSOR: We are still
in chapter 12 on

3
00:00:03,790 --> 00:00:05,780
the sum-product algorithm.

4
00:00:05,780 --> 00:00:10,460
I hope to get through that
today, fairly expeditiously,

5
00:00:10,460 --> 00:00:13,720
and to start with chapter 13.

6
00:00:13,720 --> 00:00:18,610
The handouts for today
will be chapter 13.

7
00:00:18,610 --> 00:00:24,550
And, as usual, problem set 9
with problem 8.3 has been

8
00:00:24,550 --> 00:00:26,560
moved up into problem set 9.

9
00:00:26,560 --> 00:00:33,870
Problem set 8 solutions
for 8.1 and 8.2.

10
00:00:33,870 --> 00:00:39,450
So I'm going to start off on a
different tack of presenting

11
00:00:39,450 --> 00:00:41,810
the sum-product algorithm.

12
00:00:41,810 --> 00:00:42,900
Where is everybody?

13
00:00:42,900 --> 00:00:46,460
1, 2, 3, 4, 5, 6, 7, 8, 9.

14
00:00:46,460 --> 00:00:50,850
Is there any competitive event
that I should know about?

15
00:00:50,850 --> 00:00:56,870
I know the traffic was bad and
it's a rainy, late morning.

16
00:00:56,870 --> 00:00:58,100
Problem getting here myself.

17
00:00:58,100 --> 00:01:00,440
Well, this is improving.

18
00:01:00,440 --> 00:01:07,125

19
00:01:07,125 --> 00:01:09,720
The write up of the sum-product
in the notes is in

20
00:01:09,720 --> 00:01:12,220
terms of equations.

21
00:01:12,220 --> 00:01:19,940
And I've never been terribly
satisfied with this write up.

22
00:01:19,940 --> 00:01:25,150
It's concise, but we have to
invent some new notation,

23
00:01:25,150 --> 00:01:29,960
which I don't consider to be
completely transparent.

24
00:01:29,960 --> 00:01:32,950
I've never found it particularly
easy to present

25
00:01:32,950 --> 00:01:36,240
in class, and it's been a little
frustrating to me.

26
00:01:36,240 --> 00:01:42,020
So I'd like to try a
different approach.

27
00:01:42,020 --> 00:01:46,720
I'd like in class just to go
over through a very explicit

28
00:01:46,720 --> 00:01:50,720
example, our favorite
844 code.

29
00:01:50,720 --> 00:01:54,160
What we're trying to achieve
with the sum-product algorithm

30
00:01:54,160 --> 00:01:59,110
and how the sum-product
algorithm helps us to achieve

31
00:01:59,110 --> 00:02:00,630
it in an efficient way.

32
00:02:00,630 --> 00:02:03,000
So it's proof by example.

33
00:02:03,000 --> 00:02:06,260
It won't be very satisfying to
those of you who like to see

34
00:02:06,260 --> 00:02:09,889
real proofs, but that you can
get by going back and

35
00:02:09,889 --> 00:02:12,140
re-reading the notes.

36
00:02:12,140 --> 00:02:17,190
So I'm going to try doing it
this way and you will let me

37
00:02:17,190 --> 00:02:22,380
know by your looks of gladness
or of frustration whether you

38
00:02:22,380 --> 00:02:25,380
like it or not.

39
00:02:25,380 --> 00:02:27,110
So let's start at the
beginning again.

40
00:02:27,110 --> 00:02:29,820
What are we trying to do?

41
00:02:29,820 --> 00:02:33,550
We're trying to compute the a
posteriori probability of

42
00:02:33,550 --> 00:02:37,810
every possible value of every
symbol, internal and external,

43
00:02:37,810 --> 00:02:42,720
that appears in a graphical
realization for a code.

44
00:02:42,720 --> 00:02:51,080
And we will consider these
to be APP vectors.

45
00:02:51,080 --> 00:02:53,700
In other words, if a symbol is
eight valued, there will be

46
00:02:53,700 --> 00:02:57,003
eight APP values
in the vector.

47
00:02:57,003 --> 00:03:00,200

48
00:03:00,200 --> 00:03:03,510
And we're going to assume that
we have a cycle-free graph

49
00:03:03,510 --> 00:03:10,210
realization, so that we can
proceed by the cut-set idea to

50
00:03:10,210 --> 00:03:14,010
solve independent parts
of the graph.

51
00:03:14,010 --> 00:03:17,030
And at the very end, we'll relax
that and see whether we

52
00:03:17,030 --> 00:03:20,240
can still do OK.

53
00:03:20,240 --> 00:03:25,540
All right, one concept that
seems very simple in theory

54
00:03:25,540 --> 00:03:28,730
but that some people don't get
till they really try to do it

55
00:03:28,730 --> 00:03:35,670
in practice is that an APP
vector is really a likelihood

56
00:03:35,670 --> 00:03:39,750
weight vector normalized so
that the sums of all the

57
00:03:39,750 --> 00:03:43,590
probabilities are equal to 1.

58
00:03:43,590 --> 00:03:49,500
For instance, if we send plus
1 or minus 1 through an

59
00:03:49,500 --> 00:03:55,220
additive white Gaussian Noise
channel with sigma squared

60
00:03:55,220 --> 00:04:02,180
equal to 1 or whatever, then
the probability of--

61
00:04:02,180 --> 00:04:06,820
we call this y and this r, the
probability of r given y

62
00:04:06,820 --> 00:04:08,510
equals whatever.

63
00:04:08,510 --> 00:04:13,100
This is probability of r given
y, is 1 over square root of 2

64
00:04:13,100 --> 00:04:21,200
pi sigma squared e to the minus
y minus r-squared over 2

65
00:04:21,200 --> 00:04:22,950
sigma-squared.

66
00:04:22,950 --> 00:04:28,020
And so we can compute that
for y equals 1 and y

67
00:04:28,020 --> 00:04:30,470
equals minus 1.

68
00:04:30,470 --> 00:04:33,690
And we'll get a vector
of two things.

69
00:04:33,690 --> 00:04:40,280
The APP vector would then
consist of this evaluated for

70
00:04:40,280 --> 00:04:43,090
y equals 1.

71
00:04:43,090 --> 00:04:47,440
Let me just take this part of
it, a to the minus r minus 1

72
00:04:47,440 --> 00:04:54,090
squared over 2 sigma squared
and e to the minus r plus 1

73
00:04:54,090 --> 00:04:59,370
squared over 2 sigma-squared.

74
00:04:59,370 --> 00:05:03,010
Something proportional to these
two vectors normalized

75
00:05:03,010 --> 00:05:07,050
and that's why I continue to
use this proportional to

76
00:05:07,050 --> 00:05:10,490
symbol, which you may not
have seen before.

77
00:05:10,490 --> 00:05:11,930
Or maybe you have.

78
00:05:11,930 --> 00:05:15,900
So the APP vector in this case
would be proportional to--

79
00:05:15,900 --> 00:05:18,530
well, actually 1 over square
root of 2 pi sigma-squared

80
00:05:18,530 --> 00:05:22,080
times each of these
two things.

81
00:05:22,080 --> 00:05:27,790
But this would basically tell
you how much weight to give to

82
00:05:27,790 --> 00:05:32,500
plus 1 and minus 1 as possible
transmitted symbols.

83
00:05:32,500 --> 00:05:33,490
And all this matters --

84
00:05:33,490 --> 00:05:36,350
there's one extra degree
of freedom in here.

85
00:05:36,350 --> 00:05:40,740
You can scale this, the whole
vector, by any scale factor

86
00:05:40,740 --> 00:05:43,220
you like and it gives you
the same information.

87
00:05:43,220 --> 00:05:45,680
Because you can always normalize
the two terms to be

88
00:05:45,680 --> 00:05:46,950
equal to 1.

89
00:05:46,950 --> 00:05:56,260
So if this, for instance, was
0.12 and this was 0.03, that's

90
00:05:56,260 --> 00:06:03,770
equivalent to 0.8 and 0.2.

91
00:06:03,770 --> 00:06:05,080
Two probabilities.

92
00:06:05,080 --> 00:06:08,660
Actual a-posteriori
probabilities that sum to 1,

93
00:06:08,660 --> 00:06:10,380
all you've got to do is
keep the ratio right.

94
00:06:10,380 --> 00:06:13,530
This is four times as large as
that, so what that really

95
00:06:13,530 --> 00:06:16,260
means is 0.8 and 0.2.

96
00:06:16,260 --> 00:06:20,500
And in implementation, so the
sum-product algorithm, we

97
00:06:20,500 --> 00:06:24,020
never really worry about
constant scaling factors.

98
00:06:24,020 --> 00:06:31,010
As long as they appear in every
term we can just forget

99
00:06:31,010 --> 00:06:31,790
about scaling.

100
00:06:31,790 --> 00:06:34,680
I mean we don't want things to
get too large or too small, so

101
00:06:34,680 --> 00:06:37,120
we can apply an overall
scaling factor.

102
00:06:37,120 --> 00:06:39,740
But as long as we've got
something, the relative

103
00:06:39,740 --> 00:06:41,920
weights, then we have what
we need to know.

104
00:06:41,920 --> 00:06:47,160

105
00:06:47,160 --> 00:06:53,570
Let's go to the 844 code since
we know that very well and ask

106
00:06:53,570 --> 00:06:56,990
ourselves what it is we're
trying to compute.

107
00:06:56,990 --> 00:07:02,240
We're given APP vectors for
each of the eight received

108
00:07:02,240 --> 00:07:03,120
bits, let's say.

109
00:07:03,120 --> 00:07:07,190
We transmit y, we receive some
r, perhaps over an additive

110
00:07:07,190 --> 00:07:08,970
white Gaussian noise channel.

111
00:07:08,970 --> 00:07:13,450
So we get a little 2 term APP
vector for each of these y's,

112
00:07:13,450 --> 00:07:17,045
telling us the relative
probability that a 0 was sent

113
00:07:17,045 --> 00:07:19,230
or a 1 was sent.

114
00:07:19,230 --> 00:07:21,020
That's called the intrinsic
information.

115
00:07:21,020 --> 00:07:23,560

116
00:07:23,560 --> 00:07:24,790
Let me just call that.

117
00:07:24,790 --> 00:07:27,530
So for each one we
get p0 and--

118
00:07:27,530 --> 00:07:30,080

119
00:07:30,080 --> 00:07:37,000
let me make it p0 and q0, the
two terms for y0; p1 and q1

120
00:07:37,000 --> 00:07:39,020
for y1 and so forth.

121
00:07:39,020 --> 00:07:41,030
So we have that information
for each of

122
00:07:41,030 --> 00:07:42,280
the received symbols.

123
00:07:42,280 --> 00:07:49,520

124
00:07:49,520 --> 00:07:54,870
What's the likelihood of each of
these code words then given

125
00:07:54,870 --> 00:07:58,795
these individual bitwise
likelihoods, the probability

126
00:07:58,795 --> 00:08:01,390
of the all zero code word.

127
00:08:01,390 --> 00:08:11,330
It's likelihood, let's say, is
then proportional to p0, p1,

128
00:08:11,330 --> 00:08:19,000
p2, p3, p4, p5, p6, p7.

129
00:08:19,000 --> 00:08:25,900
Whereas the probability at 1, 1,
1, 1, 0, 0, 0, 0 is q0, q1,

130
00:08:25,900 --> 00:08:32,909
q2, q3, p4, p5, p6,
p7 and so forth.

131
00:08:32,909 --> 00:08:38,190
Each one is a product of the
eight corresponding terms.

132
00:08:38,190 --> 00:08:40,780
Yes?

133
00:08:40,780 --> 00:08:44,000
AUDIENCE: You're assuming that
the bits are independent.

134
00:08:44,000 --> 00:08:45,740
PROFESSOR: Yes, it's a
memoryless channel.

135
00:08:45,740 --> 00:08:50,940
So the noise in every symbol
transmission is independent.

136
00:08:50,940 --> 00:08:52,735
And that's important too.

137
00:08:52,735 --> 00:08:55,210
AUDIENCE: So basically once
again, the second bit is

138
00:08:55,210 --> 00:08:59,180
independent of the first
bit in transmission?

139
00:08:59,180 --> 00:09:00,410
PROFESSOR: I'm sorry,
I was writing and I

140
00:09:00,410 --> 00:09:01,660
missed what you said.

141
00:09:01,660 --> 00:09:04,565

142
00:09:04,565 --> 00:09:09,030
AUDIENCE: Well, I mean, if the
first bit is 0, maybe that is

143
00:09:09,030 --> 00:09:12,010
a code where if the first
bit is 0, the second

144
00:09:12,010 --> 00:09:13,426
bit has to be 0.

145
00:09:13,426 --> 00:09:17,330

146
00:09:17,330 --> 00:09:20,910
PROFESSOR: Yeah, there's
certainly dependencies among

147
00:09:20,910 --> 00:09:21,610
the code words.

148
00:09:21,610 --> 00:09:24,070
That's what we're trying
to take into account.

149
00:09:24,070 --> 00:09:25,920
This is how to take
it into account.

150
00:09:25,920 --> 00:09:27,850
So I'm assuming a memoryless
channel.

151
00:09:27,850 --> 00:09:33,110
But I'm of course, assuming
dependencies within the code.

152
00:09:33,110 --> 00:09:35,800
How do I do that?

153
00:09:35,800 --> 00:09:40,000
One way is maximum likelihood
decoding.

154
00:09:40,000 --> 00:09:41,220
What do we do when
we do maximum

155
00:09:41,220 --> 00:09:43,560
likelihood decoding in principle?

156
00:09:43,560 --> 00:09:50,630
We exhaustively computed each of
these 16 products, each of

157
00:09:50,630 --> 00:09:54,470
these 16 likelihoods for each
of the 16 code words.

158
00:09:54,470 --> 00:09:57,330
We'd simply pick the one that's
greatest and that's the

159
00:09:57,330 --> 00:10:01,250
code word that is most likely
to have been sent.

160
00:10:01,250 --> 00:10:05,070
So that's what exhaustive
maximum likelihood decoding

161
00:10:05,070 --> 00:10:08,920
consists of, finding the maximum
likelihood word.

162
00:10:08,920 --> 00:10:12,270
And that clearly takes into
account the dependencies.

163
00:10:12,270 --> 00:10:14,100
We're only considering
valid code

164
00:10:14,100 --> 00:10:16,130
sequences when we do that.

165
00:10:16,130 --> 00:10:18,920

166
00:10:18,920 --> 00:10:23,960
In APP decoding we're doing
something else, but it can be

167
00:10:23,960 --> 00:10:25,930
characterized quite simply.

168
00:10:25,930 --> 00:10:29,410
What's the a-posteriori
probability that say, the

169
00:10:29,410 --> 00:10:36,170
first bit, is equal to a
0 or is equal to a 1?

170
00:10:36,170 --> 00:10:43,350
We can get the APP by summing up
the likelihoods of all the

171
00:10:43,350 --> 00:10:46,660
code words that are associated
with the value of the bit that

172
00:10:46,660 --> 00:10:47,900
we want to see.

173
00:10:47,900 --> 00:10:50,900
So we take the eight code words
that have a 0 in this

174
00:10:50,900 --> 00:10:54,850
place and we'd sum up these
likelihoods for those eight

175
00:10:54,850 --> 00:10:58,010
code words, and that would be
the weight, the likelihood

176
00:10:58,010 --> 00:11:01,740
weight of y0 being 0.

177
00:11:01,740 --> 00:11:05,920
And the likelihood weight of y0
being 1 would be the sum of

178
00:11:05,920 --> 00:11:07,310
the other eight.

179
00:11:07,310 --> 00:11:09,290
And again you see that
scale factors aren't

180
00:11:09,290 --> 00:11:11,240
going to matter here.

181
00:11:11,240 --> 00:11:15,010
All we want is the relative
weights of these things.

182
00:11:15,010 --> 00:11:18,110
So that's what we're trying to
do in APP decoding, but we're

183
00:11:18,110 --> 00:11:21,480
trying to do it now for
every single variable.

184
00:11:21,480 --> 00:11:24,490
At this point, I've only shown
you the symbol variables, the

185
00:11:24,490 --> 00:11:26,100
external variables.

186
00:11:26,100 --> 00:11:30,040
So a brute force way of doing
that would be to fill out this

187
00:11:30,040 --> 00:11:35,600
table, all 16 likelihoods, do
these sums and then we'd have

188
00:11:35,600 --> 00:11:38,490
the APP vector for
each of these.

189
00:11:38,490 --> 00:11:40,990

190
00:11:40,990 --> 00:11:44,820
Now we're looking for a more
efficient way of doing this.

191
00:11:44,820 --> 00:11:47,700
The efficient way of doing this
is going to be based on

192
00:11:47,700 --> 00:11:50,920
the fact we have a cycle-free
graph realization.

193
00:11:50,920 --> 00:11:53,810
Which in this case, is a
trellis realization.

194
00:11:53,810 --> 00:12:00,300
I'm going to use this two
section trellis where each

195
00:12:00,300 --> 00:12:02,810
section is a four-tuple.

196
00:12:02,810 --> 00:12:04,390
We've drawn it two
different ways.

197
00:12:04,390 --> 00:12:08,940
This is a very explicit way
showing the two parallel

198
00:12:08,940 --> 00:12:12,260
transitions that go from the
initial node to the central

199
00:12:12,260 --> 00:12:15,650
state, and then two
more down here.

200
00:12:15,650 --> 00:12:17,980
The set of code words
is the set of all

201
00:12:17,980 --> 00:12:20,070
possible paths through--

202
00:12:20,070 --> 00:12:23,080
corresponds to the set of all
paths through this trellis.

203
00:12:23,080 --> 00:12:27,200
So for instance, includes the
all zero word, (0,0,1,1,1),

204
00:12:27,200 --> 00:12:31,740
(1,1,1,0,0,0), (1,1,1,1,1,1,1),
and so forth.

205
00:12:31,740 --> 00:12:34,920
And these I happened to have
listed as the first four here.

206
00:12:34,920 --> 00:12:37,830

207
00:12:37,830 --> 00:12:43,210
Or we now have this more
abstract way of writing this

208
00:12:43,210 --> 00:12:45,210
same thing.

209
00:12:45,210 --> 00:12:48,660
This says there are four
external variables and two

210
00:12:48,660 --> 00:12:54,030
state variables or a vector
space of dimension 2 here and

211
00:12:54,030 --> 00:12:56,640
a vector space with
dimension 4 here.

212
00:12:56,640 --> 00:13:01,370
That the dimension of this
constraint is 3.

213
00:13:01,370 --> 00:13:08,240
That means there are eight
possible values for these

214
00:13:08,240 --> 00:13:10,200
combinations of 4 and 2.

215
00:13:10,200 --> 00:13:12,610
They're shown explicitly
up here.

216
00:13:12,610 --> 00:13:16,315
And they form a linear
vector space.

217
00:13:16,315 --> 00:13:22,370

218
00:13:22,370 --> 00:13:27,470
Either way, this is our
trellis realization.

219
00:13:27,470 --> 00:13:32,920
Now, we know that to do maximum
likelihood decoding,

220
00:13:32,920 --> 00:13:36,680
it's useful to have this trellis
realization, because,

221
00:13:36,680 --> 00:13:38,840
for instance, we can do
Viterbi decoding.

222
00:13:38,840 --> 00:13:41,830
And that's a more efficient
way of finding what the

223
00:13:41,830 --> 00:13:44,410
maximum likelihood
sequence is.

224
00:13:44,410 --> 00:13:47,210
Basically, because we go through
and at this point we

225
00:13:47,210 --> 00:13:52,760
can make a decision between
0, 0, 0, and 1, 1, 1, 1.

226
00:13:52,760 --> 00:13:56,670
We only have to do that based on
this four-tuple and then we

227
00:13:56,670 --> 00:13:59,110
can live with that decision
regardless of what else we see

228
00:13:59,110 --> 00:14:00,560
in the other half.

229
00:14:00,560 --> 00:14:01,950
And likewise in the
other direction.

230
00:14:01,950 --> 00:14:04,570

231
00:14:04,570 --> 00:14:09,430
So now we'd like to use this
same structure to simplify the

232
00:14:09,430 --> 00:14:12,910
APP decoding calculation,
which looks like a more

233
00:14:12,910 --> 00:14:16,120
complicated calculation.

234
00:14:16,120 --> 00:14:17,490
So how do we do that?

235
00:14:17,490 --> 00:14:22,970
We introduce two state
bits here.

236
00:14:22,970 --> 00:14:28,230
All of these have state
bit 0, 0, 0, 0.

237
00:14:28,230 --> 00:14:32,410
Sorry, this point, we've
called this state 0, 1.

238
00:14:32,410 --> 00:14:35,290

239
00:14:35,290 --> 00:14:40,430
Just to keep it visually
straight, 0, 1, 0, 1, 0, 1,

240
00:14:40,430 --> 00:14:43,900
and then there are eight more
corresponding to the other two

241
00:14:43,900 --> 00:14:47,210
possible values of the
state bits over here.

242
00:14:47,210 --> 00:14:54,520

243
00:14:54,520 --> 00:14:56,350
This is a quaternary variable.

244
00:14:56,350 --> 00:15:00,940
We want it to assume that it's
a four-valued variable and to

245
00:15:00,940 --> 00:15:03,420
make the realization cycle-free,
I don't want to

246
00:15:03,420 --> 00:15:06,000
consider it to be two
independent bits.

247
00:15:06,000 --> 00:15:07,920
So there are four
possible values

248
00:15:07,920 --> 00:15:10,110
for this state variable.

249
00:15:10,110 --> 00:15:13,345
So maybe I just ought to call
this my state space --

250
00:15:13,345 --> 00:15:16,600

251
00:15:16,600 --> 00:15:18,330
which is a four-valued
state space.

252
00:15:18,330 --> 00:15:20,970

253
00:15:20,970 --> 00:15:28,330
Now, what's the a-posteriori
probability of the state

254
00:15:28,330 --> 00:15:30,810
being say, 0, 0?

255
00:15:30,810 --> 00:15:33,930
I compute that in
the same way.

256
00:15:33,930 --> 00:15:36,600
I compute that simply by
summing up the four

257
00:15:36,600 --> 00:15:43,170
likelihoods of the code for the
external variables, the

258
00:15:43,170 --> 00:15:49,405
code words that are associated
with state variable 0, 0 --

259
00:15:49,405 --> 00:15:53,880
the first possible values of
this quaternary variable.

260
00:15:53,880 --> 00:15:56,660
So it'd be the sum of
these four things.

261
00:15:56,660 --> 00:15:57,220
Yes?

262
00:15:57,220 --> 00:15:58,470
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

263
00:15:58,470 --> 00:16:01,990

264
00:16:01,990 --> 00:16:04,852
PROFESSOR: I've only listed
half the code words.

265
00:16:04,852 --> 00:16:07,630
This is only 8 of the
16 code words.

266
00:16:07,630 --> 00:16:09,270
You want me to write
the other 8?

267
00:16:09,270 --> 00:16:15,010

268
00:16:15,010 --> 00:16:20,030
And by the way, there's at least
an implicit assumption

269
00:16:20,030 --> 00:16:24,220
here that the code sequence
determines the state sequence.

270
00:16:24,220 --> 00:16:29,630
As we found when we did minimal
trellis realizations,

271
00:16:29,630 --> 00:16:33,170
the code sequence always does
determine the state sequence.

272
00:16:33,170 --> 00:16:37,110
So if I know the entire code
word that also tells me what

273
00:16:37,110 --> 00:16:39,260
the values of all the
state variables are.

274
00:16:39,260 --> 00:16:42,240
There's a 1:1 correspondence
between code words and

275
00:16:42,240 --> 00:16:46,770
trajectories in a
minimal trellis.

276
00:16:46,770 --> 00:16:52,750
And I believe I'm sometimes
implicitly using that

277
00:16:52,750 --> 00:16:54,010
assumption.

278
00:16:54,010 --> 00:16:56,920
But anyway, it holds whenever
we have a cycle-free graph.

279
00:16:56,920 --> 00:16:59,690
In a cycle-free graph, we have
a well-defined minimal

280
00:16:59,690 --> 00:17:00,970
realization.

281
00:17:00,970 --> 00:17:04,359
And the minimal realization does
have this 1:1 property.

282
00:17:04,359 --> 00:17:06,619
So the state variables
are determined

283
00:17:06,619 --> 00:17:08,949
by the symbol variables.

284
00:17:08,949 --> 00:17:13,410

285
00:17:13,410 --> 00:17:18,030
Suppose I want to compute the
values of these state

286
00:17:18,030 --> 00:17:18,800
variables here.

287
00:17:18,800 --> 00:17:21,130
Let me write this out.

288
00:17:21,130 --> 00:17:26,874
This will be p0, p1,
p2, p3, q1--

289
00:17:26,874 --> 00:17:27,720
sorry ---

290
00:17:27,720 --> 00:17:32,330
q4, q5, q6, q7.

291
00:17:32,330 --> 00:17:34,560
And then there's one
that's all q's.

292
00:17:34,560 --> 00:17:41,370
q0, q1, q2, q3, q4,
q5, q6, q7.

293
00:17:41,370 --> 00:17:43,960
So basically, what I want to
do is take the sum of these

294
00:17:43,960 --> 00:17:48,440
four things, and that's just a
sum of products and I could

295
00:17:48,440 --> 00:17:51,040
just do it.

296
00:17:51,040 --> 00:17:54,910
But the Cartesian product lemma
gives us a simpler way

297
00:17:54,910 --> 00:17:58,020
of doing it.

298
00:17:58,020 --> 00:18:02,980
We notice that I can have
either of these two

299
00:18:02,980 --> 00:18:06,470
four-tuples as the first half
and either of these two

300
00:18:06,470 --> 00:18:08,160
four-tuples as a second half.

301
00:18:08,160 --> 00:18:12,370
In fact, I see over here,
explicitly, these four code

302
00:18:12,370 --> 00:18:17,380
words are the Cartesian product
of two four-tuple

303
00:18:17,380 --> 00:18:22,330
codes, the repetition
code on 4.

304
00:18:22,330 --> 00:18:26,480
So I can use the Cartesian
product lemma to

305
00:18:26,480 --> 00:18:28,150
write this as --

306
00:18:28,150 --> 00:18:31,970
again, let me write it
very explicitly.

307
00:18:31,970 --> 00:18:39,140
p0, p1, p2, p3, which you
recognize as the probability

308
00:18:39,140 --> 00:18:45,780
of this four-tuple plus
q0, q1, q2, q3.

309
00:18:45,780 --> 00:18:55,060
The probability of this
four-tuple times p4, p5, p6,

310
00:18:55,060 --> 00:19:05,840
p7 plus q4, q5, q6, q7.

311
00:19:05,840 --> 00:19:08,760
Now do you agree that the
product of those two terms is

312
00:19:08,760 --> 00:19:13,250
equal to this, the sum
of these four terms?

313
00:19:13,250 --> 00:19:16,590
Yes, there are four terms
in this sum and

314
00:19:16,590 --> 00:19:17,790
they're what we want.

315
00:19:17,790 --> 00:19:22,190
And it's because we have this
Cartesian product structure,

316
00:19:22,190 --> 00:19:24,240
which we always have
in the state space.

317
00:19:24,240 --> 00:19:27,905
This is the basic Markov
property of a state variable.

318
00:19:27,905 --> 00:19:31,310

319
00:19:31,310 --> 00:19:32,666
So that's a simpler
calculation.

320
00:19:32,666 --> 00:19:37,510

321
00:19:37,510 --> 00:19:42,160
This requires four
multiplications, each one is

322
00:19:42,160 --> 00:19:44,420
seven-fold multiplication.

323
00:19:44,420 --> 00:19:51,320
This only requires one multip --
well, it requires one, two,

324
00:19:51,320 --> 00:19:54,290
three, four-fold
multiplications, plus one

325
00:19:54,290 --> 00:19:57,060
overall multiplication.

326
00:19:57,060 --> 00:20:01,316
And it's certainly a simpler
way to compute that.

327
00:20:01,316 --> 00:20:04,080

328
00:20:04,080 --> 00:20:07,050
And I can similarly do that for
each of these down here.

329
00:20:07,050 --> 00:20:13,530

330
00:20:13,530 --> 00:20:17,240
What this leads me to is what
I called last time the past

331
00:20:17,240 --> 00:20:23,890
future decomposition.

332
00:20:23,890 --> 00:20:31,656
We have one APP vector over here
that corresponds to the--

333
00:20:31,656 --> 00:20:44,770
let's call it the APP vector
of the state given the past

334
00:20:44,770 --> 00:20:45,800
received symbols.

335
00:20:45,800 --> 00:20:49,080
We had something like that.

336
00:20:49,080 --> 00:20:53,850
In other words, given what was
received for y0, y1, y2, y3,

337
00:20:53,850 --> 00:20:58,140
what's the probability, the
partial a-posteriori

338
00:20:58,140 --> 00:21:02,540
probability for each of the four
possible state variables

339
00:21:02,540 --> 00:21:06,310
given this past?

340
00:21:06,310 --> 00:21:10,860
And similarly, we have another
vector which I will consider

341
00:21:10,860 --> 00:21:13,990
to be a message coming in this
direction, a message

342
00:21:13,990 --> 00:21:17,370
consisting of a vector
of four values.

343
00:21:17,370 --> 00:21:20,300
Which is what's the probability
the state vector

344
00:21:20,300 --> 00:21:26,105
given these four received
simple, which is computed in

345
00:21:26,105 --> 00:21:26,700
the same way?

346
00:21:26,700 --> 00:21:29,320
That's this here.

347
00:21:29,320 --> 00:21:32,500
And the past future rule is
that just the overall

348
00:21:32,500 --> 00:21:35,480
probability for any of these
internal state variables, the

349
00:21:35,480 --> 00:21:43,330
overall APP vector, is just
obtained by a component-wise

350
00:21:43,330 --> 00:21:44,580
multiplication.

351
00:21:44,580 --> 00:21:46,750

352
00:21:46,750 --> 00:21:51,850
This would be the APP of the
state vector being equal to 0,

353
00:21:51,850 --> 00:21:54,020
0 given the past.

354
00:21:54,020 --> 00:21:57,080
This would be APP of the
state vector given

355
00:21:57,080 --> 00:21:59,100
0, 0 given the future.

356
00:21:59,100 --> 00:22:02,790
We'd have the same thing
for 0, 1, 1, 0, 1, 1.

357
00:22:02,790 --> 00:22:07,390
And to get the overall APP
vector given all of r, the

358
00:22:07,390 --> 00:22:10,415
rule is just multiply
them component-wise.

359
00:22:10,415 --> 00:22:13,390

360
00:22:13,390 --> 00:22:17,150
We take the likelihood weight
for 0, 0 given the past and

361
00:22:17,150 --> 00:22:19,390
multiply it by likelihood
weight of

362
00:22:19,390 --> 00:22:20,470
0, 0 given the future.

363
00:22:20,470 --> 00:22:23,840
And that gives us the total
likelihood weight up to a

364
00:22:23,840 --> 00:22:25,090
scale factor.

365
00:22:25,090 --> 00:22:31,340

366
00:22:31,340 --> 00:22:35,350
The notes derive this
in equation terms.

367
00:22:35,350 --> 00:22:38,930
You see it basically comes from
this Cartesian product

368
00:22:38,930 --> 00:22:41,150
decomposition, which we
always get for any

369
00:22:41,150 --> 00:22:44,250
internal state variable.

370
00:22:44,250 --> 00:22:47,670
The possible code words
consistent with that state

371
00:22:47,670 --> 00:22:53,400
consist of a certain set of
possible past code words

372
00:22:53,400 --> 00:22:55,050
consistent with that state.

373
00:22:55,050 --> 00:22:58,250
Cartesian product with a set of
possible future code words

374
00:22:58,250 --> 00:22:59,920
consistent with that state.

375
00:22:59,920 --> 00:23:02,020
And as a result of that
Cartesian product

376
00:23:02,020 --> 00:23:06,530
decomposition, we always get
this APP vector decomposition.

377
00:23:06,530 --> 00:23:10,046
That's what the proof
says in the notes.

378
00:23:10,046 --> 00:23:11,555
Do you like this
way of arguing?

379
00:23:11,555 --> 00:23:14,640

380
00:23:14,640 --> 00:23:16,280
A couple people are
nodding at least.

381
00:23:16,280 --> 00:23:18,540
It's an experiment.

382
00:23:18,540 --> 00:23:22,470
All right, so that tells us what
we're going to want to

383
00:23:22,470 --> 00:23:24,330
do, at least, for the
internal variables.

384
00:23:24,330 --> 00:23:27,170
And actually, the same
is true up here.

385
00:23:27,170 --> 00:23:29,500
The variables going
in are easy.

386
00:23:29,500 --> 00:23:33,690
That's just p0, p1, p2,
p3, or whatever.

387
00:23:33,690 --> 00:23:35,700
The ones coming out are--

388
00:23:35,700 --> 00:23:39,730
this is called the intrinsic
variable, the one that we get

389
00:23:39,730 --> 00:23:42,700
from direct observation,
the intrinsic APP

390
00:23:42,700 --> 00:23:44,970
vector, excuse me.

391
00:23:44,970 --> 00:23:48,180
The extrinsic APP vector is the
one that we get from all

392
00:23:48,180 --> 00:23:51,260
the rest of the inputs and
symbols and the other

393
00:23:51,260 --> 00:23:52,810
parts of the graph.

394
00:23:52,810 --> 00:23:55,450
And we combine these
component-wise to get the

395
00:23:55,450 --> 00:24:00,920
overall APP vector.

396
00:24:00,920 --> 00:24:07,280
All right, but we now
have to do this for

397
00:24:07,280 --> 00:24:09,730
every part of the graph.

398
00:24:09,730 --> 00:24:13,410

399
00:24:13,410 --> 00:24:18,540
Let me go through another
calculation, which will--

400
00:24:18,540 --> 00:24:23,880
how do we, for instance, find
the APP vector for y7, the

401
00:24:23,880 --> 00:24:28,300
extrinsic APP vector for y7,
given that we have a

402
00:24:28,300 --> 00:24:29,840
trellis like this?

403
00:24:29,840 --> 00:24:32,810

404
00:24:32,810 --> 00:24:37,910
In general, for y7, we're going
to be adding up all the

405
00:24:37,910 --> 00:24:40,990
zeros, these guys.

406
00:24:40,990 --> 00:24:45,060
And four more down here, and
we'll make that the extrinsic

407
00:24:45,060 --> 00:24:48,870
APP vector of 0.

408
00:24:48,870 --> 00:24:53,550
So we want to add up this and
this and this and this and so

409
00:24:53,550 --> 00:24:56,480
forth, eight of those.

410
00:24:56,480 --> 00:24:58,520
We want to do that in
an efficient way.

411
00:24:58,520 --> 00:25:09,180

412
00:25:09,180 --> 00:25:14,465
This defines a code word with
eight possible components.

413
00:25:14,465 --> 00:25:17,720
Let me move this up now.

414
00:25:17,720 --> 00:25:20,640
So let's talk about
this 6, 3 code.

415
00:25:20,640 --> 00:25:22,320
What does it look like?

416
00:25:22,320 --> 00:25:31,290
With state vectors 0, 0,
I can have y4, y5, y6,

417
00:25:31,290 --> 00:25:33,570
y7 be 0, 0, 0, 0.

418
00:25:33,570 --> 00:25:36,920
Or I can have it
be 1, 1, 1, 1.

419
00:25:36,920 --> 00:25:39,690
The state vector is 0, 1.

420
00:25:39,690 --> 00:25:46,210
I can have 0, 1, 0, 1 or I
could have 1, 0, 1, 0.

421
00:25:46,210 --> 00:25:51,830
With state vectors variable
1, 0, I can have 0, 0, 1,

422
00:25:51,830 --> 00:25:56,010
1 or 1, 1, 0, 0.

423
00:25:56,010 --> 00:26:07,550
And with 1, 1, I can have 0, 1,
1, 0 or 1, 1, 1, 0, 0, 1.

424
00:26:07,550 --> 00:26:10,300
All right, so that's
precisely the 6, 3

425
00:26:10,300 --> 00:26:11,810
code I'm talking about.

426
00:26:11,810 --> 00:26:14,440

427
00:26:14,440 --> 00:26:19,650
Now suppose I have the APP
vector for each of these.

428
00:26:19,650 --> 00:26:24,320
The APP vector for each of
these, 0, 0 we've already

429
00:26:24,320 --> 00:26:25,090
calculated.

430
00:26:25,090 --> 00:26:32,510
It'd be p0, p1, p2, p3
plus q0, q1, q2, q3.

431
00:26:32,510 --> 00:26:40,070
That's the likelihood vector for
0, 0 coming from the past.

432
00:26:40,070 --> 00:26:44,460
So that's one part of
this message here.

433
00:26:44,460 --> 00:26:55,830
For either of these I get
p0, q1, p2, q2, q3

434
00:26:55,830 --> 00:27:01,830
plus q0, p1, q2, p3.

435
00:27:01,830 --> 00:27:05,970
Reflecting the two possible ways
I can get to this value.

436
00:27:05,970 --> 00:27:10,060

437
00:27:10,060 --> 00:27:15,060
I'm going to get four terms
that I've computed for the

438
00:27:15,060 --> 00:27:17,600
past message.

439
00:27:17,600 --> 00:27:20,600
I'm going to have four
likelihood weights

440
00:27:20,600 --> 00:27:23,100
corresponding to each
of the four values

441
00:27:23,100 --> 00:27:25,706
of this state variable.

442
00:27:25,706 --> 00:27:27,180
So I've got that vector.

443
00:27:27,180 --> 00:27:30,490
I already computed it as part
of the computation to do the

444
00:27:30,490 --> 00:27:31,490
APP for here.

445
00:27:31,490 --> 00:27:33,027
It's the past part of it.

446
00:27:33,027 --> 00:27:36,730

447
00:27:36,730 --> 00:27:45,315
So now, I want to compute the
probability that y7 is a 0.

448
00:27:45,315 --> 00:27:49,100

449
00:27:49,100 --> 00:27:56,750
The rule here is to take every
combination in this code and

450
00:27:56,750 --> 00:27:59,655
again, I'm going to get a--

451
00:27:59,655 --> 00:28:01,250
well, let me write
this part of it.

452
00:28:01,250 --> 00:28:05,990
This is going to be
p4, p5, p6, p7.

453
00:28:05,990 --> 00:28:08,210
That's the probability
of this four-tuple.

454
00:28:08,210 --> 00:28:13,710
This is q4, q5, q6, q7.

455
00:28:13,710 --> 00:28:26,660
This is p4, q5, p6, q7, q4,
p5, q6, p7 and so forth.

456
00:28:26,660 --> 00:28:30,230
That's the weights I want to
assign to each of these.

457
00:28:30,230 --> 00:28:36,940
So now I get a weight for each
of these six-tuples.

458
00:28:36,940 --> 00:28:41,510
I multiply this by this to get
the weight, the likelihood

459
00:28:41,510 --> 00:28:43,440
weight, for this code word.

460
00:28:43,440 --> 00:28:45,900
This by this to get the
likelihood weight for this

461
00:28:45,900 --> 00:28:47,550
code word and so forth.

462
00:28:47,550 --> 00:28:54,460
And I add these to
buckets for--

463
00:28:54,460 --> 00:28:57,310
I simply go through this and
I'll add this one to the

464
00:28:57,310 --> 00:28:59,100
bucket for y7 equals 0.

465
00:28:59,100 --> 00:29:02,030
And I'll add this to the
bucket for y7 equals 1.

466
00:29:02,030 --> 00:29:04,310
And this to the bucket
for y7 equals 1.

467
00:29:04,310 --> 00:29:08,450
And this to the bucket for
y7 equals 0 and so forth.

468
00:29:08,450 --> 00:29:15,750

469
00:29:15,750 --> 00:29:19,760
It's clear that in this bucket
I'm always going to get the

470
00:29:19,760 --> 00:29:21,800
same value for p--

471
00:29:21,800 --> 00:29:27,310
I'm always going to get a
p7 for 0 and a q7 for 1.

472
00:29:27,310 --> 00:29:33,060
So just looking at y7, I don't
have to compute the seventh

473
00:29:33,060 --> 00:29:34,630
value here.

474
00:29:34,630 --> 00:29:42,430
So if I did this separately as a
little y7 variable, then the

475
00:29:42,430 --> 00:29:45,400
incoming message, the intrinsic
information, would

476
00:29:45,400 --> 00:29:51,070
just be this p7, q7 likelihood
weight vector.

477
00:29:51,070 --> 00:29:53,970

478
00:29:53,970 --> 00:30:00,920
And what I really want to
compute is the outgoing part,

479
00:30:00,920 --> 00:30:03,620
which has to do with
y4 through y6.

480
00:30:03,620 --> 00:30:07,720
So let me draw it that way.

481
00:30:07,720 --> 00:30:11,190
This I don't think is very
good exposition.

482
00:30:11,190 --> 00:30:13,200
Bear with me.

483
00:30:13,200 --> 00:30:20,250
So now I'm going to put this
times this in my bucket for 0.

484
00:30:20,250 --> 00:30:25,950
This times this in my bucket
for 1, and so forth.

485
00:30:25,950 --> 00:30:30,580
The question here is, does this
method give me the same

486
00:30:30,580 --> 00:30:34,220
answer as my exhaustive
method up here?

487
00:30:34,220 --> 00:30:37,900
And again, because of the
Cartesian product character of

488
00:30:37,900 --> 00:30:41,980
the states, you can convince
yourself that it does.

489
00:30:41,980 --> 00:30:46,340
In other words, I can use this
as a summary of everything in

490
00:30:46,340 --> 00:30:50,880
the past that got to the state
value equal to 0, 0, 0.

491
00:30:50,880 --> 00:30:54,820
Any time I got to 0, 0,
0, I got through one

492
00:30:54,820 --> 00:30:56,690
of these two things.

493
00:30:56,690 --> 00:31:02,320
And so for any possible
continuation over here, I

494
00:31:02,320 --> 00:31:05,010
could have got to it in
one of these two ways.

495
00:31:05,010 --> 00:31:08,995
And I'm going to find terms,
e.g., these two terms.

496
00:31:08,995 --> 00:31:12,040

497
00:31:12,040 --> 00:31:13,740
Let's see.

498
00:31:13,740 --> 00:31:14,540
What do I want?

499
00:31:14,540 --> 00:31:18,080
I want two ways of getting
to 0, 0, 0.

500
00:31:18,080 --> 00:31:20,940
I'm sorry, these two.

501
00:31:20,940 --> 00:31:25,160
I claim that this times
these is equal to the

502
00:31:25,160 --> 00:31:28,310
sum of these two.

503
00:31:28,310 --> 00:31:29,370
And it is.

504
00:31:29,370 --> 00:31:34,260
I can get to 0, 0, 0, 0, 0
either through this or this.

505
00:31:34,260 --> 00:31:39,320
And I get to 1, 1, 1, 1, which
is this one, either through

506
00:31:39,320 --> 00:31:40,310
this or this.

507
00:31:40,310 --> 00:31:43,860
And again, it's a fundamental
property of the state that

508
00:31:43,860 --> 00:31:46,850
whenever I get one of these, I'm
going to get both of them

509
00:31:46,850 --> 00:31:49,980
in equal combination.

510
00:31:49,980 --> 00:31:54,210
So I can summarize things in
this way, which leads to an

511
00:31:54,210 --> 00:31:57,260
efficiency of computation.

512
00:31:57,260 --> 00:32:01,550
So what I'm basically explaining
to you now is the

513
00:32:01,550 --> 00:32:12,865
sum-product update rule, which
is how to propagate messages.

514
00:32:12,865 --> 00:32:17,710

515
00:32:17,710 --> 00:32:24,425
Propagating messages, which
are APP vectors.

516
00:32:24,425 --> 00:32:29,860

517
00:32:29,860 --> 00:32:32,670
In general, my situation is
going to be like this.

518
00:32:32,670 --> 00:32:36,680
I'm going to have some
constraint code, (n,k), I'm

519
00:32:36,680 --> 00:32:42,160
going to have some incoming
messages over here which tell

520
00:32:42,160 --> 00:32:45,550
me about everything that's
happened in this past, p

521
00:32:45,550 --> 00:32:48,400
prime, and in this
past, p prime.

522
00:32:48,400 --> 00:32:52,310
Give me a summary of all the
weights corresponding to that.

523
00:32:52,310 --> 00:32:56,980
I'm going to have some output
symbol over here and I want to

524
00:32:56,980 --> 00:33:02,530
compute now what the APP vector
is for the possible

525
00:33:02,530 --> 00:33:06,040
values of this output symbol,
which is going to be some

526
00:33:06,040 --> 00:33:08,010
aggregate of this down here.

527
00:33:08,010 --> 00:33:12,560
What I do is I go through all
2 to the k code words, call

528
00:33:12,560 --> 00:33:13,900
this constraint code.

529
00:33:13,900 --> 00:33:15,870
Sorry, I'm using k again.

530
00:33:15,870 --> 00:33:18,930
That's 2 to the k code words.

531
00:33:18,930 --> 00:33:22,470
And compute the product
of the inputs.

532
00:33:22,470 --> 00:33:30,130

533
00:33:30,130 --> 00:33:39,620
Of input APP vectors according
to the possible combinations

534
00:33:39,620 --> 00:33:42,430
of the input variables and the
output variables that are

535
00:33:42,430 --> 00:33:43,680
allowed by this constraint.

536
00:33:43,680 --> 00:33:46,390

537
00:33:46,390 --> 00:33:48,770
That's explicitly what
I'm doing here.

538
00:33:48,770 --> 00:33:52,410
Here are all the possible
combinations of state

539
00:33:52,410 --> 00:33:56,210
variables and symbol variables
that are allowed by this 6, 3

540
00:33:56,210 --> 00:33:57,020
constraint code.

541
00:33:57,020 --> 00:33:59,410
There are eight of them.

542
00:33:59,410 --> 00:34:05,420
For each one, I'm going to
compute the relevant product

543
00:34:05,420 --> 00:34:08,599
of APP vectors and I'm going
to add it to a bin.

544
00:34:08,599 --> 00:34:13,889
So add to a bin.

545
00:34:13,889 --> 00:34:16,954
Bins which correspond to--

546
00:34:16,954 --> 00:34:20,580
draw it like that--

547
00:34:20,580 --> 00:34:21,060
output.

548
00:34:21,060 --> 00:34:24,539
I'm calling this the output
variable values.

549
00:34:24,539 --> 00:34:34,840

550
00:34:34,840 --> 00:34:39,260
OK, and that'll give me the
summary APP vector of the

551
00:34:39,260 --> 00:34:45,960
outputs for now the past, which
consists of everything

552
00:34:45,960 --> 00:34:50,800
that could have happened in
this side of the graph.

553
00:34:50,800 --> 00:34:54,219
Again, I'm using the cycle-free
assumption in

554
00:34:54,219 --> 00:34:55,120
several ways.

555
00:34:55,120 --> 00:34:57,930
I'm saying that everything
that happens up here is

556
00:34:57,930 --> 00:35:01,770
independent of everything
that happens here.

557
00:35:01,770 --> 00:35:05,030
Everything that happens
completely in this past is

558
00:35:05,030 --> 00:35:07,140
just the sum of what
happens here.

559
00:35:07,140 --> 00:35:10,570
What happens here are subject
to this constraint and has

560
00:35:10,570 --> 00:35:12,920
nothing to do with what happens

561
00:35:12,920 --> 00:35:16,360
over here in the future.

562
00:35:16,360 --> 00:35:19,000
Eventually I'm going to do the
same thing in the future.

563
00:35:19,000 --> 00:35:22,390
I'm going to compute a future
component that's based on all

564
00:35:22,390 --> 00:35:23,365
this stuff.

565
00:35:23,365 --> 00:35:26,130
I'm going to get messages going
in each direction and

566
00:35:26,130 --> 00:35:28,010
I'm going to sum them up.

567
00:35:28,010 --> 00:35:31,120

568
00:35:31,120 --> 00:35:34,130
OK, now I'm seeing a fair
amount of puzzlement on

569
00:35:34,130 --> 00:35:37,395
people's faces.

570
00:35:37,395 --> 00:35:40,410

571
00:35:40,410 --> 00:35:45,180
Do you get abstractly what
I'm trying to do here?

572
00:35:45,180 --> 00:35:47,415
Does anyone want to ask
a clarifying question?

573
00:35:47,415 --> 00:36:02,450

574
00:36:02,450 --> 00:36:03,392
So now--

575
00:36:03,392 --> 00:36:05,182
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

576
00:36:05,182 --> 00:36:08,570
the bottom right,
output variable?

577
00:36:08,570 --> 00:36:10,230
PROFESSOR: Output variable
values, I'm sorry.

578
00:36:10,230 --> 00:36:19,080

579
00:36:19,080 --> 00:36:22,210
I think there's a little bit
of confusion here because

580
00:36:22,210 --> 00:36:24,240
sometimes I've considered
this output to

581
00:36:24,240 --> 00:36:26,780
be a 16-valued output.

582
00:36:26,780 --> 00:36:31,840
Here, in fact, well, it has
eight actually valid values.

583
00:36:31,840 --> 00:36:35,690
And if I was considering it that
way, then I would just

584
00:36:35,690 --> 00:36:38,600
have one bin for each of
these likelihoods.

585
00:36:38,600 --> 00:36:43,630
But now if I consider these as
four binary variables, then I

586
00:36:43,630 --> 00:36:47,510
would aggregate these things
according to the values of

587
00:36:47,510 --> 00:36:50,560
each of the binary values.

588
00:36:50,560 --> 00:36:56,330
And I'd get four vectors
of length 2 for

589
00:36:56,330 --> 00:36:58,530
the individual APP.

590
00:36:58,530 --> 00:37:01,900
Which is ultimately what I want
here, so I'm sorry I've

591
00:37:01,900 --> 00:37:08,340
gone back and forth between
subtly different realizations.

592
00:37:08,340 --> 00:37:10,410
I do find this very
hard to explain.

593
00:37:10,410 --> 00:37:15,120
Perhaps there's someone else
who can do a better job.

594
00:37:15,120 --> 00:37:18,990
You have a homework problem
which asks you to do this, and

595
00:37:18,990 --> 00:37:23,410
I think having done that you'll
then get the idea.

596
00:37:23,410 --> 00:37:26,260
But it's a matter of going back
and forth between the

597
00:37:26,260 --> 00:37:28,770
gross structure and the
actual equations and

598
00:37:28,770 --> 00:37:30,020
just working it out.

599
00:37:30,020 --> 00:37:37,090

600
00:37:37,090 --> 00:37:43,980
All right, so the key things in
the sum-product algorithm.

601
00:37:43,980 --> 00:37:49,820
I've already explained
some level to them.

602
00:37:49,820 --> 00:37:51,600
We have this sum-product
update.

603
00:37:51,600 --> 00:37:55,400

604
00:37:55,400 --> 00:38:02,780
That basically explains given
messages coming into a node,

605
00:38:02,780 --> 00:38:07,600
how do we compute the message
going out of the node?

606
00:38:07,600 --> 00:38:15,090
We take contributions, products
corresponding to what

607
00:38:15,090 --> 00:38:16,300
the code allows.

608
00:38:16,300 --> 00:38:18,970
We dump them into the
appropriate bins in the output

609
00:38:18,970 --> 00:38:22,902
vector and that's the
sum-product update rule.

610
00:38:22,902 --> 00:38:25,790

611
00:38:25,790 --> 00:38:30,063
We have the past future
decomposition

612
00:38:30,063 --> 00:38:34,270
or combination rule.

613
00:38:34,270 --> 00:38:37,140
And that says at the end of the
day, you're finally going

614
00:38:37,140 --> 00:38:40,380
to get messages going
in both directions.

615
00:38:40,380 --> 00:38:45,200
And you just combine them
component-wise as

616
00:38:45,200 --> 00:38:47,360
we started out with.

617
00:38:47,360 --> 00:38:51,414
And finally, we need to talk
about the overall schedule of

618
00:38:51,414 --> 00:38:52,664
the algorithm.

619
00:38:52,664 --> 00:38:56,480

620
00:38:56,480 --> 00:39:01,413
So let me draw an arbitrary
cycle-free graph.

621
00:39:01,413 --> 00:39:14,060

622
00:39:14,060 --> 00:39:18,540
OK, suppose I have an arbitrary
cycle-free graph.

623
00:39:18,540 --> 00:39:23,290
It consists of external
variables out here, internal

624
00:39:23,290 --> 00:39:28,240
variables and constraint
nodes.

625
00:39:28,240 --> 00:39:34,470
How am I going to schedule these
computations in order to

626
00:39:34,470 --> 00:39:38,970
do APP decoding of this graph?

627
00:39:38,970 --> 00:39:40,640
Well, what do I have
at the beginning?

628
00:39:40,640 --> 00:39:44,380
At the beginning, I measure
the received variables

629
00:39:44,380 --> 00:39:48,020
corresponding to each symbol,
and that gives me the

630
00:39:48,020 --> 00:39:53,090
intrinsic information, which
is incoming message you can

631
00:39:53,090 --> 00:39:56,710
consider from every
symbol variable.

632
00:39:56,710 --> 00:39:58,540
Basically, what did I
see on the channel?

633
00:39:58,540 --> 00:40:05,210
That's what that likelihood
weight vector corresponds to.

634
00:40:05,210 --> 00:40:06,460
All right.

635
00:40:06,460 --> 00:40:09,640

636
00:40:09,640 --> 00:40:14,120
What do I need in order to do
the sum-product update rule?

637
00:40:14,120 --> 00:40:15,870
I need the inputs.

638
00:40:15,870 --> 00:40:18,090
Think of this now as
a directed graph.

639
00:40:18,090 --> 00:40:27,070
I need the incoming messages on
all of the incident edges

640
00:40:27,070 --> 00:40:28,950
on this node except for one.

641
00:40:28,950 --> 00:40:32,580
If I have all but one, I can
compute the last one as an

642
00:40:32,580 --> 00:40:35,120
outgoing message.

643
00:40:35,120 --> 00:40:39,340
So that's the structure
of that update.

644
00:40:39,340 --> 00:40:42,250
So at this point, can I
compute anything here?

645
00:40:42,250 --> 00:40:44,870
No, I would need the inputs.

646
00:40:44,870 --> 00:40:47,740
But I can compute this
outgoing message.

647
00:40:47,740 --> 00:40:50,470
I can compute this
outgoing message.

648
00:40:50,470 --> 00:40:53,200
I can compute this
outgoing message.

649
00:40:53,200 --> 00:40:56,260
I haven't drawn these
of very high degree.

650
00:40:56,260 --> 00:40:59,065
Normally we have degree
higher than two.

651
00:40:59,065 --> 00:41:04,190
If I had another input here,
however, I could still get

652
00:41:04,190 --> 00:41:05,440
that output.

653
00:41:05,440 --> 00:41:07,430

654
00:41:07,430 --> 00:41:11,270
All right, so that's time one.

655
00:41:11,270 --> 00:41:15,780
Think of there being a clock
and at time one I can

656
00:41:15,780 --> 00:41:18,410
propagate messages
into the graph to

657
00:41:18,410 --> 00:41:21,530
depth one if you like.

658
00:41:21,530 --> 00:41:25,260
There's going to be a systematic
way that we can

659
00:41:25,260 --> 00:41:26,750
assign depths.

660
00:41:26,750 --> 00:41:30,100
All right, now at time two
what can I compute?

661
00:41:30,100 --> 00:41:33,300
I can compute this guy because
I have this input.

662
00:41:33,300 --> 00:41:36,600

663
00:41:36,600 --> 00:41:39,130
Now I have both inputs
coming in here, I can

664
00:41:39,130 --> 00:41:41,460
compute this guy.

665
00:41:41,460 --> 00:41:44,610
Anything else?

666
00:41:44,610 --> 00:41:47,100
So maybe 0, 1, 2.

667
00:41:47,100 --> 00:41:48,970
I should put times
on each of these,

668
00:41:48,970 --> 00:41:50,660
so I don't get confused.

669
00:41:50,660 --> 00:41:54,810
0, 0, 1, 0.

670
00:41:54,810 --> 00:41:56,750
So this is 2 at the output.

671
00:41:56,750 --> 00:42:00,300

672
00:42:00,300 --> 00:42:05,060
Now I'm at time three,
what can I compute?

673
00:42:05,060 --> 00:42:10,210
At time three, I now have two
inputs coming in here.

674
00:42:10,210 --> 00:42:13,910
I can compute this output.

675
00:42:13,910 --> 00:42:16,930
Actually, from these two inputs,
I can compute this

676
00:42:16,930 --> 00:42:18,850
output at time three.

677
00:42:18,850 --> 00:42:21,910
And from these two, I can
compute this outgoing message

678
00:42:21,910 --> 00:42:23,160
at time three.

679
00:42:23,160 --> 00:42:28,170

680
00:42:28,170 --> 00:42:29,880
Are there other things
I can compute?

681
00:42:29,880 --> 00:42:31,380
Can I compute this?

682
00:42:31,380 --> 00:42:32,830
No, I don't have this yet.

683
00:42:32,830 --> 00:42:36,360

684
00:42:36,360 --> 00:42:37,370
That's probably it.

685
00:42:37,370 --> 00:42:39,410
Does anyone else see anything
else I can compute at time

686
00:42:39,410 --> 00:42:44,050
three based on the time zero,
one, and two messages?

687
00:42:44,050 --> 00:42:45,300
No.

688
00:42:45,300 --> 00:42:47,030

689
00:42:47,030 --> 00:42:49,530
All right, but I'm
making progress.

690
00:42:49,530 --> 00:42:52,820
And should be clear that I'm
always going to be able to

691
00:42:52,820 --> 00:42:53,960
make some progress.

692
00:42:53,960 --> 00:42:56,130
At time four I can compute this

693
00:42:56,130 --> 00:42:59,400
message in that direction.

694
00:42:59,400 --> 00:43:02,560
Let's see, I now have everything
I need to compute

695
00:43:02,560 --> 00:43:05,940
this message in that
direction.

696
00:43:05,940 --> 00:43:08,860
And I'm almost done.

697
00:43:08,860 --> 00:43:13,460
And finally, I can compute this
message in this direction

698
00:43:13,460 --> 00:43:15,030
at time four, the extrinsic

699
00:43:15,030 --> 00:43:17,710
information for those variables.

700
00:43:17,710 --> 00:43:20,720
I guess I can do this
at time four.

701
00:43:20,720 --> 00:43:24,150
I had that at time three.

702
00:43:24,150 --> 00:43:27,260
And now at time five, I can get

703
00:43:27,260 --> 00:43:30,086
everything else that I need.

704
00:43:30,086 --> 00:43:32,900
Time five I get that out.

705
00:43:32,900 --> 00:43:39,700
I get this out based on this
input and this input.

706
00:43:39,700 --> 00:43:45,470
And I get this out based on
this input and this input.

707
00:43:45,470 --> 00:43:49,560
So in a finite amount of time,
in this case, five

708
00:43:49,560 --> 00:43:54,250
computational cycles, I can
compute all of the messages in

709
00:43:54,250 --> 00:43:59,860
both directions for every
variable in the graph.

710
00:43:59,860 --> 00:44:05,190
Now as a final cleanup step, I
simply component-wise multiply

711
00:44:05,190 --> 00:44:07,490
the vectors in the two
directions, and that gives me

712
00:44:07,490 --> 00:44:09,195
the APPs for all
the variables.

713
00:44:09,195 --> 00:44:11,920

714
00:44:11,920 --> 00:44:15,330
This is a kind of parallel
computation of all the APPs

715
00:44:15,330 --> 00:44:17,550
for all variables
in the graph.

716
00:44:17,550 --> 00:44:20,090

717
00:44:20,090 --> 00:44:22,960
And I claim two things.

718
00:44:22,960 --> 00:44:25,910
It's finite and it's exact.

719
00:44:25,910 --> 00:44:30,820
And the exactness is proved
from the equations and the

720
00:44:30,820 --> 00:44:31,845
cycle-free assumptions.

721
00:44:31,845 --> 00:44:35,560
So we can always decompose
things as Cartesian products.

722
00:44:35,560 --> 00:44:39,340
Independent components.

723
00:44:39,340 --> 00:44:41,650
The finiteness, well, I assumed

724
00:44:41,650 --> 00:44:44,530
this is a finite graph.

725
00:44:44,530 --> 00:44:47,970
What is this number 5?

726
00:44:47,970 --> 00:44:51,710
It's really the maximum length
of time it takes me to get

727
00:44:51,710 --> 00:44:58,160
from any of these external
symbol variables

728
00:44:58,160 --> 00:44:59,760
to any other external--

729
00:44:59,760 --> 00:45:02,900
from leaf to leaf in
graph theory terms.

730
00:45:02,900 --> 00:45:04,850
And that's called the diameter
of the graph.

731
00:45:04,850 --> 00:45:08,920
And again, some people count
the last little symbol node

732
00:45:08,920 --> 00:45:11,670
and some people don't count the
last little symbol node.

733
00:45:11,670 --> 00:45:14,420
But it's clear that a finite
graph is going to have finite

734
00:45:14,420 --> 00:45:18,180
diameter and that the
propagation time is basically

735
00:45:18,180 --> 00:45:19,830
going to be equal
to the diameter.

736
00:45:19,830 --> 00:45:23,660
That's how long it takes to
get across the graph.

737
00:45:23,660 --> 00:45:27,410
So each of these messages
in each direction

738
00:45:27,410 --> 00:45:29,210
has a certain depth.

739
00:45:29,210 --> 00:45:33,380
That's the distance in
the graph to the most

740
00:45:33,380 --> 00:45:37,620
distant leaf node.

741
00:45:37,620 --> 00:45:42,200
And the maximum sum of these
is always going to

742
00:45:42,200 --> 00:45:44,320
be 5 in this case.

743
00:45:44,320 --> 00:45:46,110
Notice that it is.

744
00:45:46,110 --> 00:45:50,415
A tiny little graph
theory theorem.

745
00:45:50,415 --> 00:45:51,665
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

746
00:45:51,665 --> 00:45:53,760

747
00:45:53,760 --> 00:45:59,534
you see why if you sum these
messages flying through those

748
00:45:59,534 --> 00:46:03,285
past and future of the single
edge, you would get the

749
00:46:03,285 --> 00:46:05,850
overall APP because you're
summing the contribution from

750
00:46:05,850 --> 00:46:10,981
both the set of outputs and
those [UNINTELLIGIBLE PHRASE].

751
00:46:10,981 --> 00:46:15,762
How would you prove that that
should be overall the APP of

752
00:46:15,762 --> 00:46:17,012
the symbols?

753
00:46:17,012 --> 00:46:19,640

754
00:46:19,640 --> 00:46:22,750
PROFESSOR: Recall the
state space theorem.

755
00:46:22,750 --> 00:46:25,420
Let's take any edge in here.

756
00:46:25,420 --> 00:46:28,100
Suppose we think about
cutting it.

757
00:46:28,100 --> 00:46:31,530
A cut-set through that edge.

758
00:46:31,530 --> 00:46:35,110
Suppose we consider
agglomerating everything back

759
00:46:35,110 --> 00:46:37,800
here and calling
that the past.

760
00:46:37,800 --> 00:46:39,940
Agglomerating everything
out here and

761
00:46:39,940 --> 00:46:42,390
calling that the future.

762
00:46:42,390 --> 00:46:46,250
This is past and future.

763
00:46:46,250 --> 00:46:49,230
The idea of the state space
theorem was that this is

764
00:46:49,230 --> 00:46:52,740
exactly the same situation
as a two section trellis.

765
00:46:52,740 --> 00:46:56,610

766
00:46:56,610 --> 00:47:04,020
Because everything is linear
you get the state space

767
00:47:04,020 --> 00:47:09,160
theorem basically, which says
that you get a Cartesian

768
00:47:09,160 --> 00:47:11,700
product of a certain past,
a certain future.

769
00:47:11,700 --> 00:47:14,500
You get a minimal state
space, which is kind

770
00:47:14,500 --> 00:47:17,050
of the total code.

771
00:47:17,050 --> 00:47:20,080
The modulo, the part of the code
that lives purely on the

772
00:47:20,080 --> 00:47:24,090
past, and the part that lives
purely on the future.

773
00:47:24,090 --> 00:47:28,400
That quotient space is really
what it is, corresponds to the

774
00:47:28,400 --> 00:47:31,870
state space at this time.

775
00:47:31,870 --> 00:47:35,920
Because every edge in a
cycle-free graph is by itself

776
00:47:35,920 --> 00:47:40,720
a cut-set, you can do this
for every edge in

777
00:47:40,720 --> 00:47:42,550
any cycle-free graph.

778
00:47:42,550 --> 00:47:44,000
You can make this
same argument.

779
00:47:44,000 --> 00:47:47,350
So you get a well-defined
minimal state space.

780
00:47:47,350 --> 00:47:51,220
You get a Cartesian product
decomposition, which has a

781
00:47:51,220 --> 00:47:54,695
number of terms equal to the
size of the state space, due

782
00:47:54,695 --> 00:47:58,260
to the dimension of the state
space over binary field.

783
00:47:58,260 --> 00:48:00,130
And so everything goes through
just as it did

784
00:48:00,130 --> 00:48:01,160
in the trellis case.

785
00:48:01,160 --> 00:48:04,700
That was the argument.

786
00:48:04,700 --> 00:48:07,910
So mushing all this
together, you get

787
00:48:07,910 --> 00:48:11,260
the sum-product algorithm.

788
00:48:11,260 --> 00:48:15,240
This is the fundamental reason
why I took the time to go

789
00:48:15,240 --> 00:48:22,390
through minimal trellis
realizations of block codes.

790
00:48:22,390 --> 00:48:25,810
Not because that's actually the
way we decode block codes

791
00:48:25,810 --> 00:48:28,880
all by themselves, or even
use block codes.

792
00:48:28,880 --> 00:48:31,040
We use convolutional codes.

793
00:48:31,040 --> 00:48:33,680
It's so that we could prove
these very fundamental

794
00:48:33,680 --> 00:48:36,450
theorems about graph decoding.

795
00:48:36,450 --> 00:48:40,090
And we aren't even going to
decode cycle-free graphs.

796
00:48:40,090 --> 00:48:42,270
But that's the intellectual
line here.

797
00:48:42,270 --> 00:48:48,490

798
00:48:48,490 --> 00:48:52,140
Thank you for that question,
it was very well timed.

799
00:48:52,140 --> 00:48:53,390
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

800
00:48:53,390 --> 00:49:00,450

801
00:49:00,450 --> 00:49:03,140
PROFESSOR: Let's take a look
at the structure of this.

802
00:49:03,140 --> 00:49:05,990
Where do the computations
occur?

803
00:49:05,990 --> 00:49:09,060
Every node there's
a computation.

804
00:49:09,060 --> 00:49:12,910
What's the complexity
of that computation?

805
00:49:12,910 --> 00:49:17,100
It's somehow approximately
equal to the size of the

806
00:49:17,100 --> 00:49:17,950
constraint code.

807
00:49:17,950 --> 00:49:21,220
We do one thing for every
possible combination of

808
00:49:21,220 --> 00:49:24,410
legitimate combination of
variables, which is what we

809
00:49:24,410 --> 00:49:26,710
mean by the size of the
constraint code.

810
00:49:26,710 --> 00:49:30,780
So if this is a 6, 3 code, we
need to do eight things here.

811
00:49:30,780 --> 00:49:35,070
Basically, eight
multiplications.

812
00:49:35,070 --> 00:49:39,540
So this, in more general
cycle-free graphs, that's what

813
00:49:39,540 --> 00:49:45,380
corresponds to branch complexity
and trellis graphs.

814
00:49:45,380 --> 00:49:48,940
We found that the branch space
corresponded to a constraint

815
00:49:48,940 --> 00:49:51,290
code in trellises.

816
00:49:51,290 --> 00:49:55,470
So the computation is basically
determined by the

817
00:49:55,470 --> 00:49:58,970
constraint code complexity and
really, overall what we'd like

818
00:49:58,970 --> 00:50:03,410
to do is minimize the constraint
code, the maximum

819
00:50:03,410 --> 00:50:05,740
size of any constraint code
because that's going to

820
00:50:05,740 --> 00:50:09,190
dominate computation.

821
00:50:09,190 --> 00:50:12,200
So we try to keep
the k in these

822
00:50:12,200 --> 00:50:15,690
things as small as possible.

823
00:50:15,690 --> 00:50:16,070
Yes?

824
00:50:16,070 --> 00:50:17,320
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

825
00:50:17,320 --> 00:50:21,650

826
00:50:21,650 --> 00:50:26,901
and you get larger dimension
constraint for

827
00:50:26,901 --> 00:50:28,151
[UNINTELLIGIBLE PHRASE].

828
00:50:28,151 --> 00:50:31,691

829
00:50:31,691 --> 00:50:33,890
PROFESSOR: Yeah, we're actually
not too concerned

830
00:50:33,890 --> 00:50:35,140
about diameter.

831
00:50:35,140 --> 00:50:37,150

832
00:50:37,150 --> 00:50:39,750
Because things go up
exponentially with constraint

833
00:50:39,750 --> 00:50:43,310
code dimension, just that single
parameter tends to be

834
00:50:43,310 --> 00:50:46,230
the thing we want to focus on.

835
00:50:46,230 --> 00:50:50,160
But we remember that's hard
to minimize in a trellis.

836
00:50:50,160 --> 00:50:55,240
We showed that basically our
only trick there after

837
00:50:55,240 --> 00:50:58,990
ordering the coordinates was
sectionalization and that we

838
00:50:58,990 --> 00:51:02,070
could not reduce the constraint
code complexity,

839
00:51:02,070 --> 00:51:06,150
the branch complexity,
by sectionalization.

840
00:51:06,150 --> 00:51:08,800
And then again, we build on that
and we went through this

841
00:51:08,800 --> 00:51:13,460
argument that said, gee, this
corresponds to some trellis.

842
00:51:13,460 --> 00:51:17,130
We could draw a trellis where
this, literally was the past

843
00:51:17,130 --> 00:51:18,650
and this was the future.

844
00:51:18,650 --> 00:51:21,990
And the minimal size state space
in this cycle-free graph

845
00:51:21,990 --> 00:51:24,165
would be the same as
in that trellis.

846
00:51:24,165 --> 00:51:26,710

847
00:51:26,710 --> 00:51:31,500
What I guess I didn't show
is you can't reduce the

848
00:51:31,500 --> 00:51:37,320
constraint code complexity
by significantly --

849
00:51:37,320 --> 00:51:39,730
actually, I'm not sure there's
a crisp theorem.

850
00:51:39,730 --> 00:51:44,620
But the fact that you now are
constrained to size of state

851
00:51:44,620 --> 00:51:48,260
space, this is certainly going
to be at least as large as

852
00:51:48,260 --> 00:51:49,610
state space sizes.

853
00:51:49,610 --> 00:51:52,580
Tends to indicate that you're
not going to be able to do a

854
00:51:52,580 --> 00:51:56,520
lot about constraint code
complexity by going to more

855
00:51:56,520 --> 00:51:59,820
elaborate cycle-free graphs than
just the chain trellis

856
00:51:59,820 --> 00:52:01,070
graph either.

857
00:52:01,070 --> 00:52:03,610

858
00:52:03,610 --> 00:52:05,902
That's, I think, a missing
theorem that

859
00:52:05,902 --> 00:52:08,530
would be nice to have.

860
00:52:08,530 --> 00:52:12,380
I had a paper about a year or
two ago, I'm not sure it ever

861
00:52:12,380 --> 00:52:13,630
quite gets to that theorem.

862
00:52:13,630 --> 00:52:15,450
But that's the theorem
you'd like to see.

863
00:52:15,450 --> 00:52:19,830
You really can't improve this.

864
00:52:19,830 --> 00:52:23,690
So ultimately, we're bound by
what you can do in a trellis.

865
00:52:23,690 --> 00:52:27,520
And in a trellis you can't do
too much about this branch

866
00:52:27,520 --> 00:52:30,690
complexity parameter once
you've ordered the

867
00:52:30,690 --> 00:52:31,940
coordinates.

868
00:52:31,940 --> 00:52:35,320

869
00:52:35,320 --> 00:52:37,810
A bit of hand-waving here, but
we're going to need to go away

870
00:52:37,810 --> 00:52:39,060
from cycle-free graphs.

871
00:52:39,060 --> 00:52:41,430

872
00:52:41,430 --> 00:52:44,720
One reason I like normal graphs
is that you get this

873
00:52:44,720 --> 00:52:47,870
very nice separation
of function.

874
00:52:47,870 --> 00:52:50,220
You get the idea that nodes

875
00:52:50,220 --> 00:52:52,380
correspond to little computers.

876
00:52:52,380 --> 00:52:55,650
You could actually realize this
with little computers.

877
00:52:55,650 --> 00:52:57,410
Put a computer in here
for each node.

878
00:52:57,410 --> 00:53:02,410
Its job is to do the sum-product
update equation.

879
00:53:02,410 --> 00:53:05,880
And then, well, what
are the edges?

880
00:53:05,880 --> 00:53:08,610
They're for communication.

881
00:53:08,610 --> 00:53:13,450
These are the wires that run
around on your chip.

882
00:53:13,450 --> 00:53:16,280
And so when we talk about state
space complexity, we're

883
00:53:16,280 --> 00:53:18,310
really talking about something
like bandwidth.

884
00:53:18,310 --> 00:53:21,170
How wide do you have to
make these wires?

885
00:53:21,170 --> 00:53:24,410
Do you have 6 bits, or
9 bits, or whatever?

886
00:53:24,410 --> 00:53:26,990
So state space really has
to do with communication

887
00:53:26,990 --> 00:53:28,440
complexity.

888
00:53:28,440 --> 00:53:31,840
Constraint or branch has to
do with the computational

889
00:53:31,840 --> 00:53:36,910
complexity, which sometimes
one's more

890
00:53:36,910 --> 00:53:37,690
important than the other.

891
00:53:37,690 --> 00:53:40,100
Computational tends to be
the thing we focus on.

892
00:53:40,100 --> 00:53:43,150
But we know more and more as
we get to more complicated

893
00:53:43,150 --> 00:53:45,890
chips, communications
complexity

894
00:53:45,890 --> 00:53:50,410
tends to become dominant.

895
00:53:50,410 --> 00:53:55,780
And this, of course, is the
I/O. These are your paths

896
00:53:55,780 --> 00:53:59,230
which go to the outside
world, your I/O paths.

897
00:53:59,230 --> 00:54:01,650
So it's really a good way
to think about this.

898
00:54:01,650 --> 00:54:04,190
And in fact, some people like
Andy [UNINTELLIGIBLE]

899
00:54:04,190 --> 00:54:08,290
and his students have
experimented with building

900
00:54:08,290 --> 00:54:10,095
analog sum-product decoders.

901
00:54:10,095 --> 00:54:13,650

902
00:54:13,650 --> 00:54:17,420
The general idea here is that
you can compute an output as

903
00:54:17,420 --> 00:54:21,510
soon as you have the two
corresponding inputs.

904
00:54:21,510 --> 00:54:25,910
Well, imagine another schedule
where this thing just computes

905
00:54:25,910 --> 00:54:28,725
all the time based on whatever
inputs it has.

906
00:54:28,725 --> 00:54:31,680

907
00:54:31,680 --> 00:54:36,500
In a cycle-free graph, say
you're in the interior.

908
00:54:36,500 --> 00:54:40,420
Initially, what you have on
your inputs is garbage.

909
00:54:40,420 --> 00:54:42,930
But eventually, you're going to
get the right things on all

910
00:54:42,930 --> 00:54:43,580
your inputs.

911
00:54:43,580 --> 00:54:45,430
And therefore, you're going to
put the right things on all

912
00:54:45,430 --> 00:54:46,810
your outputs.

913
00:54:46,810 --> 00:54:51,050
So again, up to some propagation
times, just

914
00:54:51,050 --> 00:54:55,260
putting in non-clocked little
computers here, whose function

915
00:54:55,260 --> 00:54:58,720
is to generate the right outputs
given the inputs in

916
00:54:58,720 --> 00:55:02,480
each direction, is eventually
going to be right in a

917
00:55:02,480 --> 00:55:05,330
cycle-free graph.

918
00:55:05,330 --> 00:55:08,180
And in fact, there are very
nice little analog

919
00:55:08,180 --> 00:55:12,300
implementations of this where
it turns out the sum-product

920
00:55:12,300 --> 00:55:15,700
update rule is extremely
amenable to transistor

921
00:55:15,700 --> 00:55:20,540
implementation nonlinear rule.

922
00:55:20,540 --> 00:55:26,590
So you just put these in, little
computing nodes, and

923
00:55:26,590 --> 00:55:30,090
you put whatever you received
on here as the received

924
00:55:30,090 --> 00:55:31,900
symbols, and you let it go.

925
00:55:31,900 --> 00:55:36,860
And it computes incredibly
quickly for

926
00:55:36,860 --> 00:55:38,260
a cycle-free graph.

927
00:55:38,260 --> 00:55:41,451

928
00:55:41,451 --> 00:55:47,160
Where we're going to go is we're
going to say, well, gee,

929
00:55:47,160 --> 00:55:49,790
we got this nice little local
rule that we can implement

930
00:55:49,790 --> 00:55:53,160
with local computers for the
sum-product algorithm.

931
00:55:53,160 --> 00:55:57,160
Maybe it'll work if we do it
on a graph with cycles.

932
00:55:57,160 --> 00:56:00,240
And on a graph with cycles
this kind of parallel or

933
00:56:00,240 --> 00:56:03,590
flooding schedule, where every
little computer is computing

934
00:56:03,590 --> 00:56:07,520
something at every instant of
time, is probably the most

935
00:56:07,520 --> 00:56:09,140
popular schedule.

936
00:56:09,140 --> 00:56:12,366
And again, you build an analog
implementation of that and it

937
00:56:12,366 --> 00:56:16,270
goes extremely fast and it
converges to something.

938
00:56:16,270 --> 00:56:20,740
It just relaxes to something,
there's a local equilibrium in

939
00:56:20,740 --> 00:56:23,080
all these constraint nodes.

940
00:56:23,080 --> 00:56:27,880
And therefore, some global APP
vector that satisfies the

941
00:56:27,880 --> 00:56:29,130
whole thing.

942
00:56:29,130 --> 00:56:34,640

943
00:56:34,640 --> 00:56:37,820
I think this is a nice attribute
of this kind of

944
00:56:37,820 --> 00:56:40,670
graphical realization.

945
00:56:40,670 --> 00:56:45,720
All right, a particular
example of how

946
00:56:45,720 --> 00:56:49,010
this schedule works.

947
00:56:49,010 --> 00:56:51,570
The cycle-free realization
that we care most

948
00:56:51,570 --> 00:56:55,050
about is the trellis.

949
00:56:55,050 --> 00:56:59,920
The BCJR algorithm is simply
the implementation of the

950
00:56:59,920 --> 00:57:03,616
sum-product algorithm
on a trellis.

951
00:57:03,616 --> 00:57:07,310
SP on a trellis.

952
00:57:07,310 --> 00:57:13,010
So generically, what does
a trellis look like?

953
00:57:13,010 --> 00:57:18,610
I guess I should allow for some
nontrivial code here.

954
00:57:18,610 --> 00:57:21,250
So all trellises have this
same boring graph.

955
00:57:21,250 --> 00:57:33,330

956
00:57:33,330 --> 00:57:35,750
What is this schedule?

957
00:57:35,750 --> 00:57:39,350
It's a cycle-free graph, so we
could operate according to

958
00:57:39,350 --> 00:57:44,670
this nice controlled schedule,
compute everything just once.

959
00:57:44,670 --> 00:57:46,040
And how does that operate?

960
00:57:46,040 --> 00:57:51,320
We get intrinsic information
that's measured at each of

961
00:57:51,320 --> 00:57:53,290
these symbols here.

962
00:57:53,290 --> 00:57:56,400
What did we receive
on the channel?

963
00:57:56,400 --> 00:58:02,280
We put that through a constraint
code and we get

964
00:58:02,280 --> 00:58:04,450
here, let's give this a name.

965
00:58:04,450 --> 00:58:06,870
This is called intrinsic 0.

966
00:58:06,870 --> 00:58:13,440
Here is alpha 1, which is just
a function of the intrinsic

967
00:58:13,440 --> 00:58:15,350
information at time zero.

968
00:58:15,350 --> 00:58:17,960
And we combine that with the
intrinsic information at time

969
00:58:17,960 --> 00:58:22,460
one and we get alpha
2, and so forth.

970
00:58:22,460 --> 00:58:26,810
So we get a forward-going
path down the trellis.

971
00:58:26,810 --> 00:58:31,170
If any of you knows what a
Kalman filter or a smoother

972
00:58:31,170 --> 00:58:34,310
does, this is really exactly
the same thing.

973
00:58:34,310 --> 00:58:39,690
This is finding the conditional
APP probabilities

974
00:58:39,690 --> 00:58:42,600
of this state vector given
everything in the past.

975
00:58:42,600 --> 00:58:45,335

976
00:58:45,335 --> 00:58:47,760
In Kalman filtering or
smoothing, it's done for

977
00:58:47,760 --> 00:58:48,830
Gaussian vectors.

978
00:58:48,830 --> 00:58:51,480
Here we're talking about
discrete vectors.

979
00:58:51,480 --> 00:58:54,630
But in principle, it's doing
exactly the same thing.

980
00:58:54,630 --> 00:58:57,880
It's computing conditional
probabilities given everything

981
00:58:57,880 --> 00:58:59,810
observed in the past.

982
00:58:59,810 --> 00:59:02,830
And so the diameter here
is just the length

983
00:59:02,830 --> 00:59:04,080
of the trellis basically.

984
00:59:04,080 --> 00:59:06,580

985
00:59:06,580 --> 00:59:10,740
So this is i n minus 1, i n.

986
00:59:10,740 --> 00:59:14,040
And here we have alpha n.

987
00:59:14,040 --> 00:59:18,210

988
00:59:18,210 --> 00:59:21,490
When we have alpha n finally
in here, we can compute the

989
00:59:21,490 --> 00:59:27,330
extrinsic information based on
alpha n as just extrinsic n.

990
00:59:27,330 --> 00:59:33,310
We combine these two and we're
done for that guy.

991
00:59:33,310 --> 00:59:38,600
Meanwhile, we have to compute
the backward-going information

992
00:59:38,600 --> 00:59:40,270
on each of these branches.

993
00:59:40,270 --> 00:59:44,370
So we compute first
b n based on i n.

994
00:59:44,370 --> 00:59:47,910
Then we get b n minus 1
based on i n minus 1

995
00:59:47,910 --> 00:59:50,910
and b n, and so forth.

996
00:59:50,910 --> 00:59:56,650
We get a backward-going
propagation of conditional

997
00:59:56,650 --> 00:59:57,330
information.

998
00:59:57,330 --> 01:00:01,300
Each of these betas represents
the a-posteriori probability

999
01:00:01,300 --> 01:00:04,640
vector given the
future from it.

1000
01:00:04,640 --> 01:00:07,480
And finally, when we get down
to here we can compute

1001
01:00:07,480 --> 01:00:12,850
extrinsic 0 from data 1.

1002
01:00:12,850 --> 01:00:16,800
And by this time, when we have
both alpha 1 and beta 2 coming

1003
01:00:16,800 --> 01:00:21,100
into here, we can, of course,
get extrinsic 1.

1004
01:00:21,100 --> 01:00:23,850
When we get both the alpha and
beta coming in here, we can

1005
01:00:23,850 --> 01:00:26,380
get extrinsic 2, and so forth.

1006
01:00:26,380 --> 01:00:32,800

1007
01:00:32,800 --> 01:00:36,292
Of course, you can write
this down as equations.

1008
01:00:36,292 --> 01:00:39,060
If you want to read the
equations, you look in the

1009
01:00:39,060 --> 01:00:47,190
original Bahl, Cocke, Jelinek,
and Raviv article in 1973.

1010
01:00:47,190 --> 01:00:54,300
This was again, a kind of lost
paper because just for

1011
01:00:54,300 --> 01:00:57,180
decoding a trellis, the
Viterbi algorithm

1012
01:00:57,180 --> 01:01:00,640
is much less complex.

1013
01:01:00,640 --> 01:01:05,240
And gives you approximately
the same thing.

1014
01:01:05,240 --> 01:01:09,590
I mean, I talked last time how
probably what you really want

1015
01:01:09,590 --> 01:01:12,680
to do in decoding a block code
is maximum likelihood.

1016
01:01:12,680 --> 01:01:16,090
That's minimum probability of
error in a block-wise basis.

1017
01:01:16,090 --> 01:01:18,910

1018
01:01:18,910 --> 01:01:22,650
What APP decoding gives you
minimum probability of error.

1019
01:01:22,650 --> 01:01:24,760
If you then make a decision
based on each of these

1020
01:01:24,760 --> 01:01:29,620
extrinsic information, or on the
combination of these two

1021
01:01:29,620 --> 01:01:33,000
to get the overall APP vector,
that's called a map algorithm,

1022
01:01:33,000 --> 01:01:35,600
maximum a-posteriori
probability.

1023
01:01:35,600 --> 01:01:37,460
If you make a decision based
on that, that will give you

1024
01:01:37,460 --> 01:01:39,650
the minimal bit error
probability, but it may not

1025
01:01:39,650 --> 01:01:42,840
necessarily give you a code
word, and it will not minimize

1026
01:01:42,840 --> 01:01:45,250
the probability of making any
error, which is generally what

1027
01:01:45,250 --> 01:01:47,690
you want to minimize.

1028
01:01:47,690 --> 01:01:51,180
So for trellis decoding
of block codes

1029
01:01:51,180 --> 01:01:53,410
this was not favored.

1030
01:01:53,410 --> 01:01:57,640
However, a block code as a
component of a larger code,

1031
01:01:57,640 --> 01:02:02,630
you want the APP vector to feed
off to something else.

1032
01:02:02,630 --> 01:02:10,350
And so with the advent of much
larger codes, of which block

1033
01:02:10,350 --> 01:02:14,040
codes are components or
convolutional codes, BCJR is

1034
01:02:14,040 --> 01:02:15,800
the way you want to go.

1035
01:02:15,800 --> 01:02:18,660
And some people say it's about
three times as complicated as

1036
01:02:18,660 --> 01:02:19,910
the Viterbi algorithm.

1037
01:02:19,910 --> 01:02:22,450

1038
01:02:22,450 --> 01:02:25,730
All right, so you'll have an
exercise with doing that in

1039
01:02:25,730 --> 01:02:26,980
the homework.

1040
01:02:26,980 --> 01:02:29,280

1041
01:02:29,280 --> 01:02:33,880
There is a closely related
algorithm called the min-sum

1042
01:02:33,880 --> 01:02:38,420
algorithm, or the max-product
algorithm.

1043
01:02:38,420 --> 01:02:45,880
And the idea of that is that you
can really carry through

1044
01:02:45,880 --> 01:02:52,010
the same logical computations
except when you get to one of

1045
01:02:52,010 --> 01:02:54,720
these combining operations,
rather than doing a sum of

1046
01:02:54,720 --> 01:03:00,750
products, you can do
a max of products.

1047
01:03:00,750 --> 01:03:04,540
And that the Cartesian product
law still holds, the same kind

1048
01:03:04,540 --> 01:03:07,950
of decompositions that
we have still hold.

1049
01:03:07,950 --> 01:03:21,330
For instance, in this example,
to get the state 0, 0, 0

1050
01:03:21,330 --> 01:03:24,990
before what we said we're going
to get an APP vector

1051
01:03:24,990 --> 01:03:36,490
where this is the past APP for
the 0, 0 value of the middle

1052
01:03:36,490 --> 01:03:37,270
state vector.

1053
01:03:37,270 --> 01:03:42,950
We simply combine the likelihood
weights of the two

1054
01:03:42,950 --> 01:03:46,660
possible ways of getting to that
state and we sum them.

1055
01:03:46,660 --> 01:03:49,270

1056
01:03:49,270 --> 01:03:52,575
Instead of doing sum,
let's just take max.

1057
01:03:52,575 --> 01:03:55,470

1058
01:03:55,470 --> 01:03:59,044
The maximum probability
of getting--

1059
01:03:59,044 --> 01:04:02,340
in other words, what's the
best way of getting here?

1060
01:04:02,340 --> 01:04:04,180
Same question as we ask in
the Viterbi algorithm.

1061
01:04:04,180 --> 01:04:08,520

1062
01:04:08,520 --> 01:04:13,610
We'll let that be the likelihood
weight for the

1063
01:04:13,610 --> 01:04:14,420
state vector.

1064
01:04:14,420 --> 01:04:16,870
Otherwise, the logic is
exactly the same.

1065
01:04:16,870 --> 01:04:22,040
Similarly, coming this way,
what's the max of these two?

1066
01:04:22,040 --> 01:04:24,930
OK, what's the best way of
getting to the state vector in

1067
01:04:24,930 --> 01:04:27,970
terms of likelihood
from the future?

1068
01:04:27,970 --> 01:04:31,880
And what do you know, if we
combine these things at this

1069
01:04:31,880 --> 01:04:36,130
point, we now have a past vector
and a future vector.

1070
01:04:36,130 --> 01:04:42,890
Let's take the max of this
times the max of that.

1071
01:04:42,890 --> 01:04:45,490
That gives us the maximum
likelihood way of going

1072
01:04:45,490 --> 01:04:48,175
through the 0, 0 value
of the state.

1073
01:04:48,175 --> 01:04:54,720

1074
01:04:54,720 --> 01:04:58,370
If we do exactly the same
logic for decoding this

1075
01:04:58,370 --> 01:05:01,220
trellis, we get a two-way
algorithm.

1076
01:05:01,220 --> 01:05:02,060
It's like this.

1077
01:05:02,060 --> 01:05:07,350
It's like the BCJR algorithm,
where at every point we're

1078
01:05:07,350 --> 01:05:11,060
computing the maximum
likelihood.

1079
01:05:11,060 --> 01:05:15,020
It's not hard to show that what
this gives you is that

1080
01:05:15,020 --> 01:05:17,640
each of these symbols out here,
it gives you the symbol

1081
01:05:17,640 --> 01:05:20,660
that belongs to the maximum
likelihood code word.

1082
01:05:20,660 --> 01:05:24,990

1083
01:05:24,990 --> 01:05:27,075
How does it differ from
the Viterbi algorithm?

1084
01:05:27,075 --> 01:05:28,800
It gives you the same answer.

1085
01:05:28,800 --> 01:05:32,550
It gives you the symbols, one
by one, of the maximum

1086
01:05:32,550 --> 01:05:35,640
likelihood code word.

1087
01:05:35,640 --> 01:05:37,760
It doesn't have to remember
anything.

1088
01:05:37,760 --> 01:05:41,210
It doesn't store survivors.

1089
01:05:41,210 --> 01:05:42,520
That's its advantage.

1090
01:05:42,520 --> 01:05:45,220
It's disadvantage is it's
a two-way algorithm.

1091
01:05:45,220 --> 01:05:48,320
You have to start from both
ends and propagate your

1092
01:05:48,320 --> 01:05:51,310
information in the same way.

1093
01:05:51,310 --> 01:05:53,640
And for block codes,
maybe that's OK.

1094
01:05:53,640 --> 01:05:55,870
For convolutional codes that's
certainly not something you

1095
01:05:55,870 --> 01:05:59,770
want to do because the code word
may go on indefinitely.

1096
01:05:59,770 --> 01:06:05,350
So the Viterbi algorithm gets
rid of the backward step by

1097
01:06:05,350 --> 01:06:07,785
always remembering
the survivor.

1098
01:06:07,785 --> 01:06:12,310
It remembers the history of
how it got to where it is.

1099
01:06:12,310 --> 01:06:18,100
So that once you get the max,
you don't have to--

1100
01:06:18,100 --> 01:06:20,870
you simply read it out
rather than having--

1101
01:06:20,870 --> 01:06:27,270
this backward part amounts to
a trace back operation.

1102
01:06:27,270 --> 01:06:30,980
That's probably much too
quick to be absorbed.

1103
01:06:30,980 --> 01:06:36,460
But I just wanted to mention
there is a max product

1104
01:06:36,460 --> 01:06:41,230
version, which if you take log
likelihoods becomes a max-sum.

1105
01:06:41,230 --> 01:06:42,480
Or if you take negative log

1106
01:06:42,480 --> 01:06:44,220
likelihoods it becomes min-sum.

1107
01:06:44,220 --> 01:06:46,760
So this is called the
min-sum algorithm.

1108
01:06:46,760 --> 01:06:51,030
And it does maximum likelihood
decoding

1109
01:06:51,030 --> 01:06:52,280
rather than APP decoding.

1110
01:06:52,280 --> 01:06:54,690

1111
01:06:54,690 --> 01:06:59,000
And sometimes it's used as
an approximation or very

1112
01:06:59,000 --> 01:07:04,378
intermediate approximations
between these two.

1113
01:07:04,378 --> 01:07:05,628
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1114
01:07:05,628 --> 01:07:11,100

1115
01:07:11,100 --> 01:07:16,830
PROFESSOR: No, if you use the
min-sum, basically everybody

1116
01:07:16,830 --> 01:07:20,980
will converge on the same
maximum likelihood code word.

1117
01:07:20,980 --> 01:07:24,790
So all these constraints will
be consistent with--

1118
01:07:24,790 --> 01:07:27,110
you'd be taking a single
max at each point.

1119
01:07:27,110 --> 01:07:30,130
But at each point you will
solve for the maximum

1120
01:07:30,130 --> 01:07:31,150
likelihood code word.

1121
01:07:31,150 --> 01:07:34,530
And what you'll see up here is
the symbol that belongs to the

1122
01:07:34,530 --> 01:07:36,330
maximum likelihood code word.

1123
01:07:36,330 --> 01:07:39,710
This is maybe a bit of a miracle
when you first see it,

1124
01:07:39,710 --> 01:07:44,060
but you'll independently get
each of the bits in the

1125
01:07:44,060 --> 01:07:46,770
maximum likelihood code word.

1126
01:07:46,770 --> 01:07:47,560
Check it out at home.

1127
01:07:47,560 --> 01:07:51,430
AUDIENCE: Why not
always do this?

1128
01:07:51,430 --> 01:07:54,080
PROFESSOR: Why not
always do that?

1129
01:07:54,080 --> 01:08:00,450
It turns out that we want softer
decisions when this is

1130
01:08:00,450 --> 01:08:03,230
part of a big graph.

1131
01:08:03,230 --> 01:08:07,820
We actually want the APPs, the
relative probabilities of each

1132
01:08:07,820 --> 01:08:10,010
of the values.

1133
01:08:10,010 --> 01:08:13,710
The max, in effect, is making
a hard decision.

1134
01:08:13,710 --> 01:08:16,560
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1135
01:08:16,560 --> 01:08:17,149
PROFESSOR: Yeah, all right.

1136
01:08:17,149 --> 01:08:18,819
It does give reliability
information.

1137
01:08:18,819 --> 01:08:23,340

1138
01:08:23,340 --> 01:08:25,620
I'm not sure I can give you a
conclusive answer to that

1139
01:08:25,620 --> 01:08:28,390
question except APP
works better.

1140
01:08:28,390 --> 01:08:31,569
That's a more empirical
answer.

1141
01:08:31,569 --> 01:08:32,934
Yes?

1142
01:08:32,934 --> 01:08:34,184
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1143
01:08:34,184 --> 01:08:42,089

1144
01:08:42,089 --> 01:08:46,520
Where does that flexibility
of the input symbols being

1145
01:08:46,520 --> 01:08:47,770
[UNINTELLIGIBLE PHRASE].

1146
01:08:47,770 --> 01:08:52,259

1147
01:08:52,259 --> 01:08:55,745
We should be able to take
that into account

1148
01:08:55,745 --> 01:08:56,995
[UNINTELLIGIBLE PHRASE].

1149
01:08:56,995 --> 01:09:04,979

1150
01:09:04,979 --> 01:09:06,374
PROFESSOR: Yeah.

1151
01:09:06,374 --> 01:09:18,090
OK, well, if the input symbols
are independently not

1152
01:09:18,090 --> 01:09:26,819
equiprobable, you can
feed that in as--

1153
01:09:26,819 --> 01:09:31,569
you know, once they came out
of the channel, once we see

1154
01:09:31,569 --> 01:09:35,399
the channel information,
the two symbols are not

1155
01:09:35,399 --> 01:09:37,170
equiprobable anymore.

1156
01:09:37,170 --> 01:09:43,220
So if somehow they a priori
independently were not equally

1157
01:09:43,220 --> 01:09:45,689
probable, you could just feed
that in as part of this

1158
01:09:45,689 --> 01:09:48,880
intrinsic information vector.

1159
01:09:48,880 --> 01:09:54,720
That's almost never the
case in coding.

1160
01:09:54,720 --> 01:09:58,970
The intrinsic information is
sometimes regarded as a-priori

1161
01:09:58,970 --> 01:10:04,360
and this has a-posteriori, or
the combination of the two of

1162
01:10:04,360 --> 01:10:08,690
them is then a-posteriori when
this goes off somewhere else.

1163
01:10:08,690 --> 01:10:11,510

1164
01:10:11,510 --> 01:10:17,540
But it doesn't really mean that
the symbol itself was--

1165
01:10:17,540 --> 01:10:22,170
it means that the evidence is
biased one way or the other.

1166
01:10:22,170 --> 01:10:25,980
And when this appears somewhere
else in the graph,

1167
01:10:25,980 --> 01:10:29,860
this will now appear somewhere
down in some other graph.

1168
01:10:29,860 --> 01:10:32,660
It's called the a-priori
information, but what it is is

1169
01:10:32,660 --> 01:10:35,920
the a-posteriori information
given all of the received

1170
01:10:35,920 --> 01:10:39,018
symbols in this part
of the graph.

1171
01:10:39,018 --> 01:10:40,268
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1172
01:10:40,268 --> 01:10:43,730

1173
01:10:43,730 --> 01:10:49,320
Does the i0 of that other
graph contain this table

1174
01:10:49,320 --> 01:10:53,140
sharing information
added to it?

1175
01:10:53,140 --> 01:10:56,720
PROFESSOR: Yeah, the i0 for
another graph is just the e0

1176
01:10:56,720 --> 01:10:57,970
of this graph.

1177
01:10:57,970 --> 01:11:01,710

1178
01:11:01,710 --> 01:11:03,970
I mean, if you consider it
all as one big graph,

1179
01:11:03,970 --> 01:11:04,740
that's what you want.

1180
01:11:04,740 --> 01:11:07,560
You want the message that goes
in this direction, which

1181
01:11:07,560 --> 01:11:11,210
basically summarizes all of the
inputs from this graph.

1182
01:11:11,210 --> 01:11:14,970

1183
01:11:14,970 --> 01:11:18,460
Generally, you'd have a little
equals node here.

1184
01:11:18,460 --> 01:11:23,160
You'd have the actual intrinsic
information coming

1185
01:11:23,160 --> 01:11:25,100
from the outside world there.

1186
01:11:25,100 --> 01:11:28,520
You would compute the
e0 from this graph.

1187
01:11:28,520 --> 01:11:32,975
And at this point, you would
combine e0 and i0, and that

1188
01:11:32,975 --> 01:11:34,410
would go around here.

1189
01:11:34,410 --> 01:11:40,400

1190
01:11:40,400 --> 01:11:45,460
So the very last page of chapter
12 just says, OK,

1191
01:11:45,460 --> 01:11:48,420
we've got a nice, clean, finite
exact algorithm that

1192
01:11:48,420 --> 01:11:52,120
will compute all these APPs
on a cycle-free graph.

1193
01:11:52,120 --> 01:11:54,720
Now, suppose we're faced with
a graph that has cycles.

1194
01:11:54,720 --> 01:11:57,650

1195
01:11:57,650 --> 01:11:59,490
And here's the situation.

1196
01:11:59,490 --> 01:12:01,405
Suppose the graph has
cycles in it.

1197
01:12:01,405 --> 01:12:04,220

1198
01:12:04,220 --> 01:12:10,290
Then first thing that might
occur to you, at least after

1199
01:12:10,290 --> 01:12:14,390
this development is, well, we
now have a local rule for

1200
01:12:14,390 --> 01:12:18,220
updating at each of these
computational nodes, each of

1201
01:12:18,220 --> 01:12:21,280
these constraint nodes, in
this representation.

1202
01:12:21,280 --> 01:12:24,150

1203
01:12:24,150 --> 01:12:26,960
Why don't we just let it rip?

1204
01:12:26,960 --> 01:12:28,210
See what happens.

1205
01:12:28,210 --> 01:12:30,640

1206
01:12:30,640 --> 01:12:34,630
We're going to put in some
intrinsic information at the

1207
01:12:34,630 --> 01:12:37,360
outputs as before.

1208
01:12:37,360 --> 01:12:42,490
We can compute some output
message here based on-- well,

1209
01:12:42,490 --> 01:12:47,420
eventually we're going to get
some input message from here.

1210
01:12:47,420 --> 01:12:50,520
And we can compute some output
message here based on all of

1211
01:12:50,520 --> 01:12:55,940
these and we'll pass
that over.

1212
01:12:55,940 --> 01:12:59,760
Let's assume what's called
a parallel schedule or a

1213
01:12:59,760 --> 01:13:03,770
flooding schedule, where in each
cycle every one of these

1214
01:13:03,770 --> 01:13:05,210
little guys does its thing.

1215
01:13:05,210 --> 01:13:09,740
It takes all the inputs that it
has available at that time

1216
01:13:09,740 --> 01:13:10,990
and it generates outputs.

1217
01:13:10,990 --> 01:13:14,460

1218
01:13:14,460 --> 01:13:19,160
Well, hope for the best.

1219
01:13:19,160 --> 01:13:21,540
It will compute something.

1220
01:13:21,540 --> 01:13:26,340
You can sort of intuitively
see that it'll settle.

1221
01:13:26,340 --> 01:13:31,060
It'll converge, perhaps, into
some kind of equilibrium.

1222
01:13:31,060 --> 01:13:33,685
Or hopefully it will converge
into some kind of equilibrium.

1223
01:13:33,685 --> 01:13:36,900

1224
01:13:36,900 --> 01:13:41,800
There is a basic fallacy here,
which is that the information

1225
01:13:41,800 --> 01:13:46,960
that comes in here is used to
compute part of this message.

1226
01:13:46,960 --> 01:13:49,990
Which is used to compute
part of this message.

1227
01:13:49,990 --> 01:13:53,540
Which ultimately comes back and
is used to compute part of

1228
01:13:53,540 --> 01:13:54,190
this message.

1229
01:13:54,190 --> 01:14:00,400
And then part of this message
again, so that information

1230
01:14:00,400 --> 01:14:06,590
goes around in cycles and it
tends to be used again and

1231
01:14:06,590 --> 01:14:07,450
again and again.

1232
01:14:07,450 --> 01:14:10,320
Of course, this tends
to reinforce

1233
01:14:10,320 --> 01:14:12,190
previously computed messages.

1234
01:14:12,190 --> 01:14:13,790
It tends to make you
overconfident.

1235
01:14:13,790 --> 01:14:16,330

1236
01:14:16,330 --> 01:14:23,810
You heard some rumor and then
the rumor goes all around the

1237
01:14:23,810 --> 01:14:26,260
class and it comes back
and you hear it again.

1238
01:14:26,260 --> 01:14:30,010
And that tends to confirm
the rumor.

1239
01:14:30,010 --> 01:14:34,500
So it's that kind of telephone
situation, and it's not really

1240
01:14:34,500 --> 01:14:37,190
independent information.

1241
01:14:37,190 --> 01:14:40,360
Intuitively, makes sense
that if all of the

1242
01:14:40,360 --> 01:14:42,870
cycles are very large--

1243
01:14:42,870 --> 01:14:46,230

1244
01:14:46,230 --> 01:14:51,970
in other words, the rumor has to
go to China and back before

1245
01:14:51,970 --> 01:14:52,970
you get it.

1246
01:14:52,970 --> 01:14:56,610
That it probably is highly
attenuated by

1247
01:14:56,610 --> 01:14:57,490
the time you get.

1248
01:14:57,490 --> 01:15:01,120
It's mixed up with all kinds of
other information, so maybe

1249
01:15:01,120 --> 01:15:05,590
it doesn't hurt you too much if
the cycles are very large.

1250
01:15:05,590 --> 01:15:12,100
And in fact, that's the way
these capacity approaching

1251
01:15:12,100 --> 01:15:14,040
codes are designed.

1252
01:15:14,040 --> 01:15:17,630
Whereas, as I mentioned last
time, if you're in a physical

1253
01:15:17,630 --> 01:15:23,010
situation where basically your
graph looks something like

1254
01:15:23,010 --> 01:15:26,800
this, you don't have the freedom
to make your cycles

1255
01:15:26,800 --> 01:15:30,520
large, then this is probably
a bad idea.

1256
01:15:30,520 --> 01:15:33,660
And this isn't why in fields,
like vision, for instance,

1257
01:15:33,660 --> 01:15:35,980
people try to use this--

1258
01:15:35,980 --> 01:15:38,750
the sum-product algorithm is
the belief propagation

1259
01:15:38,750 --> 01:15:40,810
algorithm, if you've
ever heard of that.

1260
01:15:40,810 --> 01:15:44,070
Widely used in inference
on Bayesian networks.

1261
01:15:44,070 --> 01:15:49,430
And the religion in belief
propagation used to be that

1262
01:15:49,430 --> 01:15:52,260
you simply can't use belief
propagation on graphs that

1263
01:15:52,260 --> 01:15:53,750
have cycles.

1264
01:15:53,750 --> 01:15:54,390
Why?

1265
01:15:54,390 --> 01:15:58,680
Because most of the
graphs they dealt

1266
01:15:58,680 --> 01:15:59,910
with look like this.

1267
01:15:59,910 --> 01:16:01,880
And in fact, it works
terribly on that.

1268
01:16:01,880 --> 01:16:04,850

1269
01:16:04,850 --> 01:16:08,090
It was a great shock on that
field when the coding people

1270
01:16:08,090 --> 01:16:11,640
came along and said, well, we
use belief propagation on

1271
01:16:11,640 --> 01:16:14,710
graphs with cycles and
it works great.

1272
01:16:14,710 --> 01:16:19,580
And this, as I understand it,
really had an enormous impact

1273
01:16:19,580 --> 01:16:23,020
on the belief propagation
community.

1274
01:16:23,020 --> 01:16:26,150
But it was realized that gee,
coding people have it nice.

1275
01:16:26,150 --> 01:16:27,400
You make your own graphs.

1276
01:16:27,400 --> 01:16:29,706

1277
01:16:29,706 --> 01:16:34,230
And you make them so this is
going to be a very low order,

1278
01:16:34,230 --> 01:16:36,540
low impact event.

1279
01:16:36,540 --> 01:16:38,480
And that's right.

1280
01:16:38,480 --> 01:16:41,020
But coding we do have
this freedom to

1281
01:16:41,020 --> 01:16:42,270
design our own graphs.

1282
01:16:42,270 --> 01:16:44,860

1283
01:16:44,860 --> 01:16:48,490
We realize from the basic
arguments that I put up before

1284
01:16:48,490 --> 01:16:52,080
that we're really going to
need to have cycles.

1285
01:16:52,080 --> 01:16:54,720
As long as we have cycle-free
graphs, we're going to get

1286
01:16:54,720 --> 01:16:57,810
this kind of exponentially
increasing complexity, which

1287
01:16:57,810 --> 01:17:00,250
is characteristic of
trellis graphs.

1288
01:17:00,250 --> 01:17:04,480
So we're going to need cycles,
but as long as the cycles are

1289
01:17:04,480 --> 01:17:07,610
very long and attenuated, as
long as the girth of the graph

1290
01:17:07,610 --> 01:17:09,860
is great, maybe we can
get away with it.

1291
01:17:09,860 --> 01:17:14,030
And that, in fact, is the way
the story has developed.

1292
01:17:14,030 --> 01:17:18,960
And Gallager basically saw this
back in 1961, but it took

1293
01:17:18,960 --> 01:17:22,400
a long time for it to catch.

1294
01:17:22,400 --> 01:17:26,690
So I didn't even get
into chapter 13.

1295
01:17:26,690 --> 01:17:31,220
Chapter 13 we're going to first
just introduce several

1296
01:17:31,220 --> 01:17:34,450
classes, the most popular
classes of capacity

1297
01:17:34,450 --> 01:17:36,140
approaching codes.

1298
01:17:36,140 --> 01:17:42,370
And then I'm going to give
you some real performance

1299
01:17:42,370 --> 01:17:46,420
analysis, how you would design,

1300
01:17:46,420 --> 01:17:50,060
simulate, in these codes.

1301
01:17:50,060 --> 01:17:52,690
And it's what people
actually use.

1302
01:17:52,690 --> 01:17:54,440
OK, see you next time.

1303
01:17:54,440 --> 01:18:03,135