1 00:00:00,000 --> 00:00:00,910 2 00:00:00,910 --> 00:00:03,790 PROFESSOR: We are still in chapter 12 on 3 00:00:03,790 --> 00:00:05,780 the sum-product algorithm. 4 00:00:05,780 --> 00:00:10,460 I hope to get through that today, fairly expeditiously, 5 00:00:10,460 --> 00:00:13,720 and to start with chapter 13. 6 00:00:13,720 --> 00:00:18,610 The handouts for today will be chapter 13. 7 00:00:18,610 --> 00:00:24,550 And, as usual, problem set 9 with problem 8.3 has been 8 00:00:24,550 --> 00:00:26,560 moved up into problem set 9. 9 00:00:26,560 --> 00:00:33,870 Problem set 8 solutions for 8.1 and 8.2. 10 00:00:33,870 --> 00:00:39,450 So I'm going to start off on a different tack of presenting 11 00:00:39,450 --> 00:00:41,810 the sum-product algorithm. 12 00:00:41,810 --> 00:00:42,900 Where is everybody? 13 00:00:42,900 --> 00:00:46,460 1, 2, 3, 4, 5, 6, 7, 8, 9. 14 00:00:46,460 --> 00:00:50,850 Is there any competitive event that I should know about? 15 00:00:50,850 --> 00:00:56,870 I know the traffic was bad and it's a rainy, late morning. 16 00:00:56,870 --> 00:00:58,100 Problem getting here myself. 17 00:00:58,100 --> 00:01:00,440 Well, this is improving. 18 00:01:00,440 --> 00:01:07,125 19 00:01:07,125 --> 00:01:09,720 The write up of the sum-product in the notes is in 20 00:01:09,720 --> 00:01:12,220 terms of equations. 21 00:01:12,220 --> 00:01:19,940 And I've never been terribly satisfied with this write up. 22 00:01:19,940 --> 00:01:25,150 It's concise, but we have to invent some new notation, 23 00:01:25,150 --> 00:01:29,960 which I don't consider to be completely transparent. 24 00:01:29,960 --> 00:01:32,950 I've never found it particularly easy to present 25 00:01:32,950 --> 00:01:36,240 in class, and it's been a little frustrating to me. 26 00:01:36,240 --> 00:01:42,020 So I'd like to try a different approach. 27 00:01:42,020 --> 00:01:46,720 I'd like in class just to go over through a very explicit 28 00:01:46,720 --> 00:01:50,720 example, our favorite 844 code. 29 00:01:50,720 --> 00:01:54,160 What we're trying to achieve with the sum-product algorithm 30 00:01:54,160 --> 00:01:59,110 and how the sum-product algorithm helps us to achieve 31 00:01:59,110 --> 00:02:00,630 it in an efficient way. 32 00:02:00,630 --> 00:02:03,000 So it's proof by example. 33 00:02:03,000 --> 00:02:06,260 It won't be very satisfying to those of you who like to see 34 00:02:06,260 --> 00:02:09,889 real proofs, but that you can get by going back and 35 00:02:09,889 --> 00:02:12,140 re-reading the notes. 36 00:02:12,140 --> 00:02:17,190 So I'm going to try doing it this way and you will let me 37 00:02:17,190 --> 00:02:22,380 know by your looks of gladness or of frustration whether you 38 00:02:22,380 --> 00:02:25,380 like it or not. 39 00:02:25,380 --> 00:02:27,110 So let's start at the beginning again. 40 00:02:27,110 --> 00:02:29,820 What are we trying to do? 41 00:02:29,820 --> 00:02:33,550 We're trying to compute the a posteriori probability of 42 00:02:33,550 --> 00:02:37,810 every possible value of every symbol, internal and external, 43 00:02:37,810 --> 00:02:42,720 that appears in a graphical realization for a code. 44 00:02:42,720 --> 00:02:51,080 And we will consider these to be APP vectors. 45 00:02:51,080 --> 00:02:53,700 In other words, if a symbol is eight valued, there will be 46 00:02:53,700 --> 00:02:57,003 eight APP values in the vector. 47 00:02:57,003 --> 00:03:00,200 48 00:03:00,200 --> 00:03:03,510 And we're going to assume that we have a cycle-free graph 49 00:03:03,510 --> 00:03:10,210 realization, so that we can proceed by the cut-set idea to 50 00:03:10,210 --> 00:03:14,010 solve independent parts of the graph. 51 00:03:14,010 --> 00:03:17,030 And at the very end, we'll relax that and see whether we 52 00:03:17,030 --> 00:03:20,240 can still do OK. 53 00:03:20,240 --> 00:03:25,540 All right, one concept that seems very simple in theory 54 00:03:25,540 --> 00:03:28,730 but that some people don't get till they really try to do it 55 00:03:28,730 --> 00:03:35,670 in practice is that an APP vector is really a likelihood 56 00:03:35,670 --> 00:03:39,750 weight vector normalized so that the sums of all the 57 00:03:39,750 --> 00:03:43,590 probabilities are equal to 1. 58 00:03:43,590 --> 00:03:49,500 For instance, if we send plus 1 or minus 1 through an 59 00:03:49,500 --> 00:03:55,220 additive white Gaussian Noise channel with sigma squared 60 00:03:55,220 --> 00:04:02,180 equal to 1 or whatever, then the probability of-- 61 00:04:02,180 --> 00:04:06,820 we call this y and this r, the probability of r given y 62 00:04:06,820 --> 00:04:08,510 equals whatever. 63 00:04:08,510 --> 00:04:13,100 This is probability of r given y, is 1 over square root of 2 64 00:04:13,100 --> 00:04:21,200 pi sigma squared e to the minus y minus r-squared over 2 65 00:04:21,200 --> 00:04:22,950 sigma-squared. 66 00:04:22,950 --> 00:04:28,020 And so we can compute that for y equals 1 and y 67 00:04:28,020 --> 00:04:30,470 equals minus 1. 68 00:04:30,470 --> 00:04:33,690 And we'll get a vector of two things. 69 00:04:33,690 --> 00:04:40,280 The APP vector would then consist of this evaluated for 70 00:04:40,280 --> 00:04:43,090 y equals 1. 71 00:04:43,090 --> 00:04:47,440 Let me just take this part of it, a to the minus r minus 1 72 00:04:47,440 --> 00:04:54,090 squared over 2 sigma squared and e to the minus r plus 1 73 00:04:54,090 --> 00:04:59,370 squared over 2 sigma-squared. 74 00:04:59,370 --> 00:05:03,010 Something proportional to these two vectors normalized 75 00:05:03,010 --> 00:05:07,050 and that's why I continue to use this proportional to 76 00:05:07,050 --> 00:05:10,490 symbol, which you may not have seen before. 77 00:05:10,490 --> 00:05:11,930 Or maybe you have. 78 00:05:11,930 --> 00:05:15,900 So the APP vector in this case would be proportional to-- 79 00:05:15,900 --> 00:05:18,530 well, actually 1 over square root of 2 pi sigma-squared 80 00:05:18,530 --> 00:05:22,080 times each of these two things. 81 00:05:22,080 --> 00:05:27,790 But this would basically tell you how much weight to give to 82 00:05:27,790 --> 00:05:32,500 plus 1 and minus 1 as possible transmitted symbols. 83 00:05:32,500 --> 00:05:33,490 And all this matters -- 84 00:05:33,490 --> 00:05:36,350 there's one extra degree of freedom in here. 85 00:05:36,350 --> 00:05:40,740 You can scale this, the whole vector, by any scale factor 86 00:05:40,740 --> 00:05:43,220 you like and it gives you the same information. 87 00:05:43,220 --> 00:05:45,680 Because you can always normalize the two terms to be 88 00:05:45,680 --> 00:05:46,950 equal to 1. 89 00:05:46,950 --> 00:05:56,260 So if this, for instance, was 0.12 and this was 0.03, that's 90 00:05:56,260 --> 00:06:03,770 equivalent to 0.8 and 0.2. 91 00:06:03,770 --> 00:06:05,080 Two probabilities. 92 00:06:05,080 --> 00:06:08,660 Actual a-posteriori probabilities that sum to 1, 93 00:06:08,660 --> 00:06:10,380 all you've got to do is keep the ratio right. 94 00:06:10,380 --> 00:06:13,530 This is four times as large as that, so what that really 95 00:06:13,530 --> 00:06:16,260 means is 0.8 and 0.2. 96 00:06:16,260 --> 00:06:20,500 And in implementation, so the sum-product algorithm, we 97 00:06:20,500 --> 00:06:24,020 never really worry about constant scaling factors. 98 00:06:24,020 --> 00:06:31,010 As long as they appear in every term we can just forget 99 00:06:31,010 --> 00:06:31,790 about scaling. 100 00:06:31,790 --> 00:06:34,680 I mean we don't want things to get too large or too small, so 101 00:06:34,680 --> 00:06:37,120 we can apply an overall scaling factor. 102 00:06:37,120 --> 00:06:39,740 But as long as we've got something, the relative 103 00:06:39,740 --> 00:06:41,920 weights, then we have what we need to know. 104 00:06:41,920 --> 00:06:47,160 105 00:06:47,160 --> 00:06:53,570 Let's go to the 844 code since we know that very well and ask 106 00:06:53,570 --> 00:06:56,990 ourselves what it is we're trying to compute. 107 00:06:56,990 --> 00:07:02,240 We're given APP vectors for each of the eight received 108 00:07:02,240 --> 00:07:03,120 bits, let's say. 109 00:07:03,120 --> 00:07:07,190 We transmit y, we receive some r, perhaps over an additive 110 00:07:07,190 --> 00:07:08,970 white Gaussian noise channel. 111 00:07:08,970 --> 00:07:13,450 So we get a little 2 term APP vector for each of these y's, 112 00:07:13,450 --> 00:07:17,045 telling us the relative probability that a 0 was sent 113 00:07:17,045 --> 00:07:19,230 or a 1 was sent. 114 00:07:19,230 --> 00:07:21,020 That's called the intrinsic information. 115 00:07:21,020 --> 00:07:23,560 116 00:07:23,560 --> 00:07:24,790 Let me just call that. 117 00:07:24,790 --> 00:07:27,530 So for each one we get p0 and-- 118 00:07:27,530 --> 00:07:30,080 119 00:07:30,080 --> 00:07:37,000 let me make it p0 and q0, the two terms for y0; p1 and q1 120 00:07:37,000 --> 00:07:39,020 for y1 and so forth. 121 00:07:39,020 --> 00:07:41,030 So we have that information for each of 122 00:07:41,030 --> 00:07:42,280 the received symbols. 123 00:07:42,280 --> 00:07:49,520 124 00:07:49,520 --> 00:07:54,870 What's the likelihood of each of these code words then given 125 00:07:54,870 --> 00:07:58,795 these individual bitwise likelihoods, the probability 126 00:07:58,795 --> 00:08:01,390 of the all zero code word. 127 00:08:01,390 --> 00:08:11,330 It's likelihood, let's say, is then proportional to p0, p1, 128 00:08:11,330 --> 00:08:19,000 p2, p3, p4, p5, p6, p7. 129 00:08:19,000 --> 00:08:25,900 Whereas the probability at 1, 1, 1, 1, 0, 0, 0, 0 is q0, q1, 130 00:08:25,900 --> 00:08:32,909 q2, q3, p4, p5, p6, p7 and so forth. 131 00:08:32,909 --> 00:08:38,190 Each one is a product of the eight corresponding terms. 132 00:08:38,190 --> 00:08:40,780 Yes? 133 00:08:40,780 --> 00:08:44,000 AUDIENCE: You're assuming that the bits are independent. 134 00:08:44,000 --> 00:08:45,740 PROFESSOR: Yes, it's a memoryless channel. 135 00:08:45,740 --> 00:08:50,940 So the noise in every symbol transmission is independent. 136 00:08:50,940 --> 00:08:52,735 And that's important too. 137 00:08:52,735 --> 00:08:55,210 AUDIENCE: So basically once again, the second bit is 138 00:08:55,210 --> 00:08:59,180 independent of the first bit in transmission? 139 00:08:59,180 --> 00:09:00,410 PROFESSOR: I'm sorry, I was writing and I 140 00:09:00,410 --> 00:09:01,660 missed what you said. 141 00:09:01,660 --> 00:09:04,565 142 00:09:04,565 --> 00:09:09,030 AUDIENCE: Well, I mean, if the first bit is 0, maybe that is 143 00:09:09,030 --> 00:09:12,010 a code where if the first bit is 0, the second 144 00:09:12,010 --> 00:09:13,426 bit has to be 0. 145 00:09:13,426 --> 00:09:17,330 146 00:09:17,330 --> 00:09:20,910 PROFESSOR: Yeah, there's certainly dependencies among 147 00:09:20,910 --> 00:09:21,610 the code words. 148 00:09:21,610 --> 00:09:24,070 That's what we're trying to take into account. 149 00:09:24,070 --> 00:09:25,920 This is how to take it into account. 150 00:09:25,920 --> 00:09:27,850 So I'm assuming a memoryless channel. 151 00:09:27,850 --> 00:09:33,110 But I'm of course, assuming dependencies within the code. 152 00:09:33,110 --> 00:09:35,800 How do I do that? 153 00:09:35,800 --> 00:09:40,000 One way is maximum likelihood decoding. 154 00:09:40,000 --> 00:09:41,220 What do we do when we do maximum 155 00:09:41,220 --> 00:09:43,560 likelihood decoding in principle? 156 00:09:43,560 --> 00:09:50,630 We exhaustively computed each of these 16 products, each of 157 00:09:50,630 --> 00:09:54,470 these 16 likelihoods for each of the 16 code words. 158 00:09:54,470 --> 00:09:57,330 We'd simply pick the one that's greatest and that's the 159 00:09:57,330 --> 00:10:01,250 code word that is most likely to have been sent. 160 00:10:01,250 --> 00:10:05,070 So that's what exhaustive maximum likelihood decoding 161 00:10:05,070 --> 00:10:08,920 consists of, finding the maximum likelihood word. 162 00:10:08,920 --> 00:10:12,270 And that clearly takes into account the dependencies. 163 00:10:12,270 --> 00:10:14,100 We're only considering valid code 164 00:10:14,100 --> 00:10:16,130 sequences when we do that. 165 00:10:16,130 --> 00:10:18,920 166 00:10:18,920 --> 00:10:23,960 In APP decoding we're doing something else, but it can be 167 00:10:23,960 --> 00:10:25,930 characterized quite simply. 168 00:10:25,930 --> 00:10:29,410 What's the a-posteriori probability that say, the 169 00:10:29,410 --> 00:10:36,170 first bit, is equal to a 0 or is equal to a 1? 170 00:10:36,170 --> 00:10:43,350 We can get the APP by summing up the likelihoods of all the 171 00:10:43,350 --> 00:10:46,660 code words that are associated with the value of the bit that 172 00:10:46,660 --> 00:10:47,900 we want to see. 173 00:10:47,900 --> 00:10:50,900 So we take the eight code words that have a 0 in this 174 00:10:50,900 --> 00:10:54,850 place and we'd sum up these likelihoods for those eight 175 00:10:54,850 --> 00:10:58,010 code words, and that would be the weight, the likelihood 176 00:10:58,010 --> 00:11:01,740 weight of y0 being 0. 177 00:11:01,740 --> 00:11:05,920 And the likelihood weight of y0 being 1 would be the sum of 178 00:11:05,920 --> 00:11:07,310 the other eight. 179 00:11:07,310 --> 00:11:09,290 And again you see that scale factors aren't 180 00:11:09,290 --> 00:11:11,240 going to matter here. 181 00:11:11,240 --> 00:11:15,010 All we want is the relative weights of these things. 182 00:11:15,010 --> 00:11:18,110 So that's what we're trying to do in APP decoding, but we're 183 00:11:18,110 --> 00:11:21,480 trying to do it now for every single variable. 184 00:11:21,480 --> 00:11:24,490 At this point, I've only shown you the symbol variables, the 185 00:11:24,490 --> 00:11:26,100 external variables. 186 00:11:26,100 --> 00:11:30,040 So a brute force way of doing that would be to fill out this 187 00:11:30,040 --> 00:11:35,600 table, all 16 likelihoods, do these sums and then we'd have 188 00:11:35,600 --> 00:11:38,490 the APP vector for each of these. 189 00:11:38,490 --> 00:11:40,990 190 00:11:40,990 --> 00:11:44,820 Now we're looking for a more efficient way of doing this. 191 00:11:44,820 --> 00:11:47,700 The efficient way of doing this is going to be based on 192 00:11:47,700 --> 00:11:50,920 the fact we have a cycle-free graph realization. 193 00:11:50,920 --> 00:11:53,810 Which in this case, is a trellis realization. 194 00:11:53,810 --> 00:12:00,300 I'm going to use this two section trellis where each 195 00:12:00,300 --> 00:12:02,810 section is a four-tuple. 196 00:12:02,810 --> 00:12:04,390 We've drawn it two different ways. 197 00:12:04,390 --> 00:12:08,940 This is a very explicit way showing the two parallel 198 00:12:08,940 --> 00:12:12,260 transitions that go from the initial node to the central 199 00:12:12,260 --> 00:12:15,650 state, and then two more down here. 200 00:12:15,650 --> 00:12:17,980 The set of code words is the set of all 201 00:12:17,980 --> 00:12:20,070 possible paths through-- 202 00:12:20,070 --> 00:12:23,080 corresponds to the set of all paths through this trellis. 203 00:12:23,080 --> 00:12:27,200 So for instance, includes the all zero word, (0,0,1,1,1), 204 00:12:27,200 --> 00:12:31,740 (1,1,1,0,0,0), (1,1,1,1,1,1,1), and so forth. 205 00:12:31,740 --> 00:12:34,920 And these I happened to have listed as the first four here. 206 00:12:34,920 --> 00:12:37,830 207 00:12:37,830 --> 00:12:43,210 Or we now have this more abstract way of writing this 208 00:12:43,210 --> 00:12:45,210 same thing. 209 00:12:45,210 --> 00:12:48,660 This says there are four external variables and two 210 00:12:48,660 --> 00:12:54,030 state variables or a vector space of dimension 2 here and 211 00:12:54,030 --> 00:12:56,640 a vector space with dimension 4 here. 212 00:12:56,640 --> 00:13:01,370 That the dimension of this constraint is 3. 213 00:13:01,370 --> 00:13:08,240 That means there are eight possible values for these 214 00:13:08,240 --> 00:13:10,200 combinations of 4 and 2. 215 00:13:10,200 --> 00:13:12,610 They're shown explicitly up here. 216 00:13:12,610 --> 00:13:16,315 And they form a linear vector space. 217 00:13:16,315 --> 00:13:22,370 218 00:13:22,370 --> 00:13:27,470 Either way, this is our trellis realization. 219 00:13:27,470 --> 00:13:32,920 Now, we know that to do maximum likelihood decoding, 220 00:13:32,920 --> 00:13:36,680 it's useful to have this trellis realization, because, 221 00:13:36,680 --> 00:13:38,840 for instance, we can do Viterbi decoding. 222 00:13:38,840 --> 00:13:41,830 And that's a more efficient way of finding what the 223 00:13:41,830 --> 00:13:44,410 maximum likelihood sequence is. 224 00:13:44,410 --> 00:13:47,210 Basically, because we go through and at this point we 225 00:13:47,210 --> 00:13:52,760 can make a decision between 0, 0, 0, and 1, 1, 1, 1. 226 00:13:52,760 --> 00:13:56,670 We only have to do that based on this four-tuple and then we 227 00:13:56,670 --> 00:13:59,110 can live with that decision regardless of what else we see 228 00:13:59,110 --> 00:14:00,560 in the other half. 229 00:14:00,560 --> 00:14:01,950 And likewise in the other direction. 230 00:14:01,950 --> 00:14:04,570 231 00:14:04,570 --> 00:14:09,430 So now we'd like to use this same structure to simplify the 232 00:14:09,430 --> 00:14:12,910 APP decoding calculation, which looks like a more 233 00:14:12,910 --> 00:14:16,120 complicated calculation. 234 00:14:16,120 --> 00:14:17,490 So how do we do that? 235 00:14:17,490 --> 00:14:22,970 We introduce two state bits here. 236 00:14:22,970 --> 00:14:28,230 All of these have state bit 0, 0, 0, 0. 237 00:14:28,230 --> 00:14:32,410 Sorry, this point, we've called this state 0, 1. 238 00:14:32,410 --> 00:14:35,290 239 00:14:35,290 --> 00:14:40,430 Just to keep it visually straight, 0, 1, 0, 1, 0, 1, 240 00:14:40,430 --> 00:14:43,900 and then there are eight more corresponding to the other two 241 00:14:43,900 --> 00:14:47,210 possible values of the state bits over here. 242 00:14:47,210 --> 00:14:54,520 243 00:14:54,520 --> 00:14:56,350 This is a quaternary variable. 244 00:14:56,350 --> 00:15:00,940 We want it to assume that it's a four-valued variable and to 245 00:15:00,940 --> 00:15:03,420 make the realization cycle-free, I don't want to 246 00:15:03,420 --> 00:15:06,000 consider it to be two independent bits. 247 00:15:06,000 --> 00:15:07,920 So there are four possible values 248 00:15:07,920 --> 00:15:10,110 for this state variable. 249 00:15:10,110 --> 00:15:13,345 So maybe I just ought to call this my state space -- 250 00:15:13,345 --> 00:15:16,600 251 00:15:16,600 --> 00:15:18,330 which is a four-valued state space. 252 00:15:18,330 --> 00:15:20,970 253 00:15:20,970 --> 00:15:28,330 Now, what's the a-posteriori probability of the state 254 00:15:28,330 --> 00:15:30,810 being say, 0, 0? 255 00:15:30,810 --> 00:15:33,930 I compute that in the same way. 256 00:15:33,930 --> 00:15:36,600 I compute that simply by summing up the four 257 00:15:36,600 --> 00:15:43,170 likelihoods of the code for the external variables, the 258 00:15:43,170 --> 00:15:49,405 code words that are associated with state variable 0, 0 -- 259 00:15:49,405 --> 00:15:53,880 the first possible values of this quaternary variable. 260 00:15:53,880 --> 00:15:56,660 So it'd be the sum of these four things. 261 00:15:56,660 --> 00:15:57,220 Yes? 262 00:15:57,220 --> 00:15:58,470 AUDIENCE: [UNINTELLIGIBLE PHRASE] 263 00:15:58,470 --> 00:16:01,990 264 00:16:01,990 --> 00:16:04,852 PROFESSOR: I've only listed half the code words. 265 00:16:04,852 --> 00:16:07,630 This is only 8 of the 16 code words. 266 00:16:07,630 --> 00:16:09,270 You want me to write the other 8? 267 00:16:09,270 --> 00:16:15,010 268 00:16:15,010 --> 00:16:20,030 And by the way, there's at least an implicit assumption 269 00:16:20,030 --> 00:16:24,220 here that the code sequence determines the state sequence. 270 00:16:24,220 --> 00:16:29,630 As we found when we did minimal trellis realizations, 271 00:16:29,630 --> 00:16:33,170 the code sequence always does determine the state sequence. 272 00:16:33,170 --> 00:16:37,110 So if I know the entire code word that also tells me what 273 00:16:37,110 --> 00:16:39,260 the values of all the state variables are. 274 00:16:39,260 --> 00:16:42,240 There's a 1:1 correspondence between code words and 275 00:16:42,240 --> 00:16:46,770 trajectories in a minimal trellis. 276 00:16:46,770 --> 00:16:52,750 And I believe I'm sometimes implicitly using that 277 00:16:52,750 --> 00:16:54,010 assumption. 278 00:16:54,010 --> 00:16:56,920 But anyway, it holds whenever we have a cycle-free graph. 279 00:16:56,920 --> 00:16:59,690 In a cycle-free graph, we have a well-defined minimal 280 00:16:59,690 --> 00:17:00,970 realization. 281 00:17:00,970 --> 00:17:04,359 And the minimal realization does have this 1:1 property. 282 00:17:04,359 --> 00:17:06,619 So the state variables are determined 283 00:17:06,619 --> 00:17:08,949 by the symbol variables. 284 00:17:08,949 --> 00:17:13,410 285 00:17:13,410 --> 00:17:18,030 Suppose I want to compute the values of these state 286 00:17:18,030 --> 00:17:18,800 variables here. 287 00:17:18,800 --> 00:17:21,130 Let me write this out. 288 00:17:21,130 --> 00:17:26,874 This will be p0, p1, p2, p3, q1-- 289 00:17:26,874 --> 00:17:27,720 sorry --- 290 00:17:27,720 --> 00:17:32,330 q4, q5, q6, q7. 291 00:17:32,330 --> 00:17:34,560 And then there's one that's all q's. 292 00:17:34,560 --> 00:17:41,370 q0, q1, q2, q3, q4, q5, q6, q7. 293 00:17:41,370 --> 00:17:43,960 So basically, what I want to do is take the sum of these 294 00:17:43,960 --> 00:17:48,440 four things, and that's just a sum of products and I could 295 00:17:48,440 --> 00:17:51,040 just do it. 296 00:17:51,040 --> 00:17:54,910 But the Cartesian product lemma gives us a simpler way 297 00:17:54,910 --> 00:17:58,020 of doing it. 298 00:17:58,020 --> 00:18:02,980 We notice that I can have either of these two 299 00:18:02,980 --> 00:18:06,470 four-tuples as the first half and either of these two 300 00:18:06,470 --> 00:18:08,160 four-tuples as a second half. 301 00:18:08,160 --> 00:18:12,370 In fact, I see over here, explicitly, these four code 302 00:18:12,370 --> 00:18:17,380 words are the Cartesian product of two four-tuple 303 00:18:17,380 --> 00:18:22,330 codes, the repetition code on 4. 304 00:18:22,330 --> 00:18:26,480 So I can use the Cartesian product lemma to 305 00:18:26,480 --> 00:18:28,150 write this as -- 306 00:18:28,150 --> 00:18:31,970 again, let me write it very explicitly. 307 00:18:31,970 --> 00:18:39,140 p0, p1, p2, p3, which you recognize as the probability 308 00:18:39,140 --> 00:18:45,780 of this four-tuple plus q0, q1, q2, q3. 309 00:18:45,780 --> 00:18:55,060 The probability of this four-tuple times p4, p5, p6, 310 00:18:55,060 --> 00:19:05,840 p7 plus q4, q5, q6, q7. 311 00:19:05,840 --> 00:19:08,760 Now do you agree that the product of those two terms is 312 00:19:08,760 --> 00:19:13,250 equal to this, the sum of these four terms? 313 00:19:13,250 --> 00:19:16,590 Yes, there are four terms in this sum and 314 00:19:16,590 --> 00:19:17,790 they're what we want. 315 00:19:17,790 --> 00:19:22,190 And it's because we have this Cartesian product structure, 316 00:19:22,190 --> 00:19:24,240 which we always have in the state space. 317 00:19:24,240 --> 00:19:27,905 This is the basic Markov property of a state variable. 318 00:19:27,905 --> 00:19:31,310 319 00:19:31,310 --> 00:19:32,666 So that's a simpler calculation. 320 00:19:32,666 --> 00:19:37,510 321 00:19:37,510 --> 00:19:42,160 This requires four multiplications, each one is 322 00:19:42,160 --> 00:19:44,420 seven-fold multiplication. 323 00:19:44,420 --> 00:19:51,320 This only requires one multip -- well, it requires one, two, 324 00:19:51,320 --> 00:19:54,290 three, four-fold multiplications, plus one 325 00:19:54,290 --> 00:19:57,060 overall multiplication. 326 00:19:57,060 --> 00:20:01,316 And it's certainly a simpler way to compute that. 327 00:20:01,316 --> 00:20:04,080 328 00:20:04,080 --> 00:20:07,050 And I can similarly do that for each of these down here. 329 00:20:07,050 --> 00:20:13,530 330 00:20:13,530 --> 00:20:17,240 What this leads me to is what I called last time the past 331 00:20:17,240 --> 00:20:23,890 future decomposition. 332 00:20:23,890 --> 00:20:31,656 We have one APP vector over here that corresponds to the-- 333 00:20:31,656 --> 00:20:44,770 let's call it the APP vector of the state given the past 334 00:20:44,770 --> 00:20:45,800 received symbols. 335 00:20:45,800 --> 00:20:49,080 We had something like that. 336 00:20:49,080 --> 00:20:53,850 In other words, given what was received for y0, y1, y2, y3, 337 00:20:53,850 --> 00:20:58,140 what's the probability, the partial a-posteriori 338 00:20:58,140 --> 00:21:02,540 probability for each of the four possible state variables 339 00:21:02,540 --> 00:21:06,310 given this past? 340 00:21:06,310 --> 00:21:10,860 And similarly, we have another vector which I will consider 341 00:21:10,860 --> 00:21:13,990 to be a message coming in this direction, a message 342 00:21:13,990 --> 00:21:17,370 consisting of a vector of four values. 343 00:21:17,370 --> 00:21:20,300 Which is what's the probability the state vector 344 00:21:20,300 --> 00:21:26,105 given these four received simple, which is computed in 345 00:21:26,105 --> 00:21:26,700 the same way? 346 00:21:26,700 --> 00:21:29,320 That's this here. 347 00:21:29,320 --> 00:21:32,500 And the past future rule is that just the overall 348 00:21:32,500 --> 00:21:35,480 probability for any of these internal state variables, the 349 00:21:35,480 --> 00:21:43,330 overall APP vector, is just obtained by a component-wise 350 00:21:43,330 --> 00:21:44,580 multiplication. 351 00:21:44,580 --> 00:21:46,750 352 00:21:46,750 --> 00:21:51,850 This would be the APP of the state vector being equal to 0, 353 00:21:51,850 --> 00:21:54,020 0 given the past. 354 00:21:54,020 --> 00:21:57,080 This would be APP of the state vector given 355 00:21:57,080 --> 00:21:59,100 0, 0 given the future. 356 00:21:59,100 --> 00:22:02,790 We'd have the same thing for 0, 1, 1, 0, 1, 1. 357 00:22:02,790 --> 00:22:07,390 And to get the overall APP vector given all of r, the 358 00:22:07,390 --> 00:22:10,415 rule is just multiply them component-wise. 359 00:22:10,415 --> 00:22:13,390 360 00:22:13,390 --> 00:22:17,150 We take the likelihood weight for 0, 0 given the past and 361 00:22:17,150 --> 00:22:19,390 multiply it by likelihood weight of 362 00:22:19,390 --> 00:22:20,470 0, 0 given the future. 363 00:22:20,470 --> 00:22:23,840 And that gives us the total likelihood weight up to a 364 00:22:23,840 --> 00:22:25,090 scale factor. 365 00:22:25,090 --> 00:22:31,340 366 00:22:31,340 --> 00:22:35,350 The notes derive this in equation terms. 367 00:22:35,350 --> 00:22:38,930 You see it basically comes from this Cartesian product 368 00:22:38,930 --> 00:22:41,150 decomposition, which we always get for any 369 00:22:41,150 --> 00:22:44,250 internal state variable. 370 00:22:44,250 --> 00:22:47,670 The possible code words consistent with that state 371 00:22:47,670 --> 00:22:53,400 consist of a certain set of possible past code words 372 00:22:53,400 --> 00:22:55,050 consistent with that state. 373 00:22:55,050 --> 00:22:58,250 Cartesian product with a set of possible future code words 374 00:22:58,250 --> 00:22:59,920 consistent with that state. 375 00:22:59,920 --> 00:23:02,020 And as a result of that Cartesian product 376 00:23:02,020 --> 00:23:06,530 decomposition, we always get this APP vector decomposition. 377 00:23:06,530 --> 00:23:10,046 That's what the proof says in the notes. 378 00:23:10,046 --> 00:23:11,555 Do you like this way of arguing? 379 00:23:11,555 --> 00:23:14,640 380 00:23:14,640 --> 00:23:16,280 A couple people are nodding at least. 381 00:23:16,280 --> 00:23:18,540 It's an experiment. 382 00:23:18,540 --> 00:23:22,470 All right, so that tells us what we're going to want to 383 00:23:22,470 --> 00:23:24,330 do, at least, for the internal variables. 384 00:23:24,330 --> 00:23:27,170 And actually, the same is true up here. 385 00:23:27,170 --> 00:23:29,500 The variables going in are easy. 386 00:23:29,500 --> 00:23:33,690 That's just p0, p1, p2, p3, or whatever. 387 00:23:33,690 --> 00:23:35,700 The ones coming out are-- 388 00:23:35,700 --> 00:23:39,730 this is called the intrinsic variable, the one that we get 389 00:23:39,730 --> 00:23:42,700 from direct observation, the intrinsic APP 390 00:23:42,700 --> 00:23:44,970 vector, excuse me. 391 00:23:44,970 --> 00:23:48,180 The extrinsic APP vector is the one that we get from all 392 00:23:48,180 --> 00:23:51,260 the rest of the inputs and symbols and the other 393 00:23:51,260 --> 00:23:52,810 parts of the graph. 394 00:23:52,810 --> 00:23:55,450 And we combine these component-wise to get the 395 00:23:55,450 --> 00:24:00,920 overall APP vector. 396 00:24:00,920 --> 00:24:07,280 All right, but we now have to do this for 397 00:24:07,280 --> 00:24:09,730 every part of the graph. 398 00:24:09,730 --> 00:24:13,410 399 00:24:13,410 --> 00:24:18,540 Let me go through another calculation, which will-- 400 00:24:18,540 --> 00:24:23,880 how do we, for instance, find the APP vector for y7, the 401 00:24:23,880 --> 00:24:28,300 extrinsic APP vector for y7, given that we have a 402 00:24:28,300 --> 00:24:29,840 trellis like this? 403 00:24:29,840 --> 00:24:32,810 404 00:24:32,810 --> 00:24:37,910 In general, for y7, we're going to be adding up all the 405 00:24:37,910 --> 00:24:40,990 zeros, these guys. 406 00:24:40,990 --> 00:24:45,060 And four more down here, and we'll make that the extrinsic 407 00:24:45,060 --> 00:24:48,870 APP vector of 0. 408 00:24:48,870 --> 00:24:53,550 So we want to add up this and this and this and this and so 409 00:24:53,550 --> 00:24:56,480 forth, eight of those. 410 00:24:56,480 --> 00:24:58,520 We want to do that in an efficient way. 411 00:24:58,520 --> 00:25:09,180 412 00:25:09,180 --> 00:25:14,465 This defines a code word with eight possible components. 413 00:25:14,465 --> 00:25:17,720 Let me move this up now. 414 00:25:17,720 --> 00:25:20,640 So let's talk about this 6, 3 code. 415 00:25:20,640 --> 00:25:22,320 What does it look like? 416 00:25:22,320 --> 00:25:31,290 With state vectors 0, 0, I can have y4, y5, y6, 417 00:25:31,290 --> 00:25:33,570 y7 be 0, 0, 0, 0. 418 00:25:33,570 --> 00:25:36,920 Or I can have it be 1, 1, 1, 1. 419 00:25:36,920 --> 00:25:39,690 The state vector is 0, 1. 420 00:25:39,690 --> 00:25:46,210 I can have 0, 1, 0, 1 or I could have 1, 0, 1, 0. 421 00:25:46,210 --> 00:25:51,830 With state vectors variable 1, 0, I can have 0, 0, 1, 422 00:25:51,830 --> 00:25:56,010 1 or 1, 1, 0, 0. 423 00:25:56,010 --> 00:26:07,550 And with 1, 1, I can have 0, 1, 1, 0 or 1, 1, 1, 0, 0, 1. 424 00:26:07,550 --> 00:26:10,300 All right, so that's precisely the 6, 3 425 00:26:10,300 --> 00:26:11,810 code I'm talking about. 426 00:26:11,810 --> 00:26:14,440 427 00:26:14,440 --> 00:26:19,650 Now suppose I have the APP vector for each of these. 428 00:26:19,650 --> 00:26:24,320 The APP vector for each of these, 0, 0 we've already 429 00:26:24,320 --> 00:26:25,090 calculated. 430 00:26:25,090 --> 00:26:32,510 It'd be p0, p1, p2, p3 plus q0, q1, q2, q3. 431 00:26:32,510 --> 00:26:40,070 That's the likelihood vector for 0, 0 coming from the past. 432 00:26:40,070 --> 00:26:44,460 So that's one part of this message here. 433 00:26:44,460 --> 00:26:55,830 For either of these I get p0, q1, p2, q2, q3 434 00:26:55,830 --> 00:27:01,830 plus q0, p1, q2, p3. 435 00:27:01,830 --> 00:27:05,970 Reflecting the two possible ways I can get to this value. 436 00:27:05,970 --> 00:27:10,060 437 00:27:10,060 --> 00:27:15,060 I'm going to get four terms that I've computed for the 438 00:27:15,060 --> 00:27:17,600 past message. 439 00:27:17,600 --> 00:27:20,600 I'm going to have four likelihood weights 440 00:27:20,600 --> 00:27:23,100 corresponding to each of the four values 441 00:27:23,100 --> 00:27:25,706 of this state variable. 442 00:27:25,706 --> 00:27:27,180 So I've got that vector. 443 00:27:27,180 --> 00:27:30,490 I already computed it as part of the computation to do the 444 00:27:30,490 --> 00:27:31,490 APP for here. 445 00:27:31,490 --> 00:27:33,027 It's the past part of it. 446 00:27:33,027 --> 00:27:36,730 447 00:27:36,730 --> 00:27:45,315 So now, I want to compute the probability that y7 is a 0. 448 00:27:45,315 --> 00:27:49,100 449 00:27:49,100 --> 00:27:56,750 The rule here is to take every combination in this code and 450 00:27:56,750 --> 00:27:59,655 again, I'm going to get a-- 451 00:27:59,655 --> 00:28:01,250 well, let me write this part of it. 452 00:28:01,250 --> 00:28:05,990 This is going to be p4, p5, p6, p7. 453 00:28:05,990 --> 00:28:08,210 That's the probability of this four-tuple. 454 00:28:08,210 --> 00:28:13,710 This is q4, q5, q6, q7. 455 00:28:13,710 --> 00:28:26,660 This is p4, q5, p6, q7, q4, p5, q6, p7 and so forth. 456 00:28:26,660 --> 00:28:30,230 That's the weights I want to assign to each of these. 457 00:28:30,230 --> 00:28:36,940 So now I get a weight for each of these six-tuples. 458 00:28:36,940 --> 00:28:41,510 I multiply this by this to get the weight, the likelihood 459 00:28:41,510 --> 00:28:43,440 weight, for this code word. 460 00:28:43,440 --> 00:28:45,900 This by this to get the likelihood weight for this 461 00:28:45,900 --> 00:28:47,550 code word and so forth. 462 00:28:47,550 --> 00:28:54,460 And I add these to buckets for-- 463 00:28:54,460 --> 00:28:57,310 I simply go through this and I'll add this one to the 464 00:28:57,310 --> 00:28:59,100 bucket for y7 equals 0. 465 00:28:59,100 --> 00:29:02,030 And I'll add this to the bucket for y7 equals 1. 466 00:29:02,030 --> 00:29:04,310 And this to the bucket for y7 equals 1. 467 00:29:04,310 --> 00:29:08,450 And this to the bucket for y7 equals 0 and so forth. 468 00:29:08,450 --> 00:29:15,750 469 00:29:15,750 --> 00:29:19,760 It's clear that in this bucket I'm always going to get the 470 00:29:19,760 --> 00:29:21,800 same value for p-- 471 00:29:21,800 --> 00:29:27,310 I'm always going to get a p7 for 0 and a q7 for 1. 472 00:29:27,310 --> 00:29:33,060 So just looking at y7, I don't have to compute the seventh 473 00:29:33,060 --> 00:29:34,630 value here. 474 00:29:34,630 --> 00:29:42,430 So if I did this separately as a little y7 variable, then the 475 00:29:42,430 --> 00:29:45,400 incoming message, the intrinsic information, would 476 00:29:45,400 --> 00:29:51,070 just be this p7, q7 likelihood weight vector. 477 00:29:51,070 --> 00:29:53,970 478 00:29:53,970 --> 00:30:00,920 And what I really want to compute is the outgoing part, 479 00:30:00,920 --> 00:30:03,620 which has to do with y4 through y6. 480 00:30:03,620 --> 00:30:07,720 So let me draw it that way. 481 00:30:07,720 --> 00:30:11,190 This I don't think is very good exposition. 482 00:30:11,190 --> 00:30:13,200 Bear with me. 483 00:30:13,200 --> 00:30:20,250 So now I'm going to put this times this in my bucket for 0. 484 00:30:20,250 --> 00:30:25,950 This times this in my bucket for 1, and so forth. 485 00:30:25,950 --> 00:30:30,580 The question here is, does this method give me the same 486 00:30:30,580 --> 00:30:34,220 answer as my exhaustive method up here? 487 00:30:34,220 --> 00:30:37,900 And again, because of the Cartesian product character of 488 00:30:37,900 --> 00:30:41,980 the states, you can convince yourself that it does. 489 00:30:41,980 --> 00:30:46,340 In other words, I can use this as a summary of everything in 490 00:30:46,340 --> 00:30:50,880 the past that got to the state value equal to 0, 0, 0. 491 00:30:50,880 --> 00:30:54,820 Any time I got to 0, 0, 0, I got through one 492 00:30:54,820 --> 00:30:56,690 of these two things. 493 00:30:56,690 --> 00:31:02,320 And so for any possible continuation over here, I 494 00:31:02,320 --> 00:31:05,010 could have got to it in one of these two ways. 495 00:31:05,010 --> 00:31:08,995 And I'm going to find terms, e.g., these two terms. 496 00:31:08,995 --> 00:31:12,040 497 00:31:12,040 --> 00:31:13,740 Let's see. 498 00:31:13,740 --> 00:31:14,540 What do I want? 499 00:31:14,540 --> 00:31:18,080 I want two ways of getting to 0, 0, 0. 500 00:31:18,080 --> 00:31:20,940 I'm sorry, these two. 501 00:31:20,940 --> 00:31:25,160 I claim that this times these is equal to the 502 00:31:25,160 --> 00:31:28,310 sum of these two. 503 00:31:28,310 --> 00:31:29,370 And it is. 504 00:31:29,370 --> 00:31:34,260 I can get to 0, 0, 0, 0, 0 either through this or this. 505 00:31:34,260 --> 00:31:39,320 And I get to 1, 1, 1, 1, which is this one, either through 506 00:31:39,320 --> 00:31:40,310 this or this. 507 00:31:40,310 --> 00:31:43,860 And again, it's a fundamental property of the state that 508 00:31:43,860 --> 00:31:46,850 whenever I get one of these, I'm going to get both of them 509 00:31:46,850 --> 00:31:49,980 in equal combination. 510 00:31:49,980 --> 00:31:54,210 So I can summarize things in this way, which leads to an 511 00:31:54,210 --> 00:31:57,260 efficiency of computation. 512 00:31:57,260 --> 00:32:01,550 So what I'm basically explaining to you now is the 513 00:32:01,550 --> 00:32:12,865 sum-product update rule, which is how to propagate messages. 514 00:32:12,865 --> 00:32:17,710 515 00:32:17,710 --> 00:32:24,425 Propagating messages, which are APP vectors. 516 00:32:24,425 --> 00:32:29,860 517 00:32:29,860 --> 00:32:32,670 In general, my situation is going to be like this. 518 00:32:32,670 --> 00:32:36,680 I'm going to have some constraint code, (n,k), I'm 519 00:32:36,680 --> 00:32:42,160 going to have some incoming messages over here which tell 520 00:32:42,160 --> 00:32:45,550 me about everything that's happened in this past, p 521 00:32:45,550 --> 00:32:48,400 prime, and in this past, p prime. 522 00:32:48,400 --> 00:32:52,310 Give me a summary of all the weights corresponding to that. 523 00:32:52,310 --> 00:32:56,980 I'm going to have some output symbol over here and I want to 524 00:32:56,980 --> 00:33:02,530 compute now what the APP vector is for the possible 525 00:33:02,530 --> 00:33:06,040 values of this output symbol, which is going to be some 526 00:33:06,040 --> 00:33:08,010 aggregate of this down here. 527 00:33:08,010 --> 00:33:12,560 What I do is I go through all 2 to the k code words, call 528 00:33:12,560 --> 00:33:13,900 this constraint code. 529 00:33:13,900 --> 00:33:15,870 Sorry, I'm using k again. 530 00:33:15,870 --> 00:33:18,930 That's 2 to the k code words. 531 00:33:18,930 --> 00:33:22,470 And compute the product of the inputs. 532 00:33:22,470 --> 00:33:30,130 533 00:33:30,130 --> 00:33:39,620 Of input APP vectors according to the possible combinations 534 00:33:39,620 --> 00:33:42,430 of the input variables and the output variables that are 535 00:33:42,430 --> 00:33:43,680 allowed by this constraint. 536 00:33:43,680 --> 00:33:46,390 537 00:33:46,390 --> 00:33:48,770 That's explicitly what I'm doing here. 538 00:33:48,770 --> 00:33:52,410 Here are all the possible combinations of state 539 00:33:52,410 --> 00:33:56,210 variables and symbol variables that are allowed by this 6, 3 540 00:33:56,210 --> 00:33:57,020 constraint code. 541 00:33:57,020 --> 00:33:59,410 There are eight of them. 542 00:33:59,410 --> 00:34:05,420 For each one, I'm going to compute the relevant product 543 00:34:05,420 --> 00:34:08,599 of APP vectors and I'm going to add it to a bin. 544 00:34:08,599 --> 00:34:13,889 So add to a bin. 545 00:34:13,889 --> 00:34:16,954 Bins which correspond to-- 546 00:34:16,954 --> 00:34:20,580 draw it like that-- 547 00:34:20,580 --> 00:34:21,060 output. 548 00:34:21,060 --> 00:34:24,539 I'm calling this the output variable values. 549 00:34:24,539 --> 00:34:34,840 550 00:34:34,840 --> 00:34:39,260 OK, and that'll give me the summary APP vector of the 551 00:34:39,260 --> 00:34:45,960 outputs for now the past, which consists of everything 552 00:34:45,960 --> 00:34:50,800 that could have happened in this side of the graph. 553 00:34:50,800 --> 00:34:54,219 Again, I'm using the cycle-free assumption in 554 00:34:54,219 --> 00:34:55,120 several ways. 555 00:34:55,120 --> 00:34:57,930 I'm saying that everything that happens up here is 556 00:34:57,930 --> 00:35:01,770 independent of everything that happens here. 557 00:35:01,770 --> 00:35:05,030 Everything that happens completely in this past is 558 00:35:05,030 --> 00:35:07,140 just the sum of what happens here. 559 00:35:07,140 --> 00:35:10,570 What happens here are subject to this constraint and has 560 00:35:10,570 --> 00:35:12,920 nothing to do with what happens 561 00:35:12,920 --> 00:35:16,360 over here in the future. 562 00:35:16,360 --> 00:35:19,000 Eventually I'm going to do the same thing in the future. 563 00:35:19,000 --> 00:35:22,390 I'm going to compute a future component that's based on all 564 00:35:22,390 --> 00:35:23,365 this stuff. 565 00:35:23,365 --> 00:35:26,130 I'm going to get messages going in each direction and 566 00:35:26,130 --> 00:35:28,010 I'm going to sum them up. 567 00:35:28,010 --> 00:35:31,120 568 00:35:31,120 --> 00:35:34,130 OK, now I'm seeing a fair amount of puzzlement on 569 00:35:34,130 --> 00:35:37,395 people's faces. 570 00:35:37,395 --> 00:35:40,410 571 00:35:40,410 --> 00:35:45,180 Do you get abstractly what I'm trying to do here? 572 00:35:45,180 --> 00:35:47,415 Does anyone want to ask a clarifying question? 573 00:35:47,415 --> 00:36:02,450 574 00:36:02,450 --> 00:36:03,392 So now-- 575 00:36:03,392 --> 00:36:05,182 AUDIENCE: [UNINTELLIGIBLE PHRASE] 576 00:36:05,182 --> 00:36:08,570 the bottom right, output variable? 577 00:36:08,570 --> 00:36:10,230 PROFESSOR: Output variable values, I'm sorry. 578 00:36:10,230 --> 00:36:19,080 579 00:36:19,080 --> 00:36:22,210 I think there's a little bit of confusion here because 580 00:36:22,210 --> 00:36:24,240 sometimes I've considered this output to 581 00:36:24,240 --> 00:36:26,780 be a 16-valued output. 582 00:36:26,780 --> 00:36:31,840 Here, in fact, well, it has eight actually valid values. 583 00:36:31,840 --> 00:36:35,690 And if I was considering it that way, then I would just 584 00:36:35,690 --> 00:36:38,600 have one bin for each of these likelihoods. 585 00:36:38,600 --> 00:36:43,630 But now if I consider these as four binary variables, then I 586 00:36:43,630 --> 00:36:47,510 would aggregate these things according to the values of 587 00:36:47,510 --> 00:36:50,560 each of the binary values. 588 00:36:50,560 --> 00:36:56,330 And I'd get four vectors of length 2 for 589 00:36:56,330 --> 00:36:58,530 the individual APP. 590 00:36:58,530 --> 00:37:01,900 Which is ultimately what I want here, so I'm sorry I've 591 00:37:01,900 --> 00:37:08,340 gone back and forth between subtly different realizations. 592 00:37:08,340 --> 00:37:10,410 I do find this very hard to explain. 593 00:37:10,410 --> 00:37:15,120 Perhaps there's someone else who can do a better job. 594 00:37:15,120 --> 00:37:18,990 You have a homework problem which asks you to do this, and 595 00:37:18,990 --> 00:37:23,410 I think having done that you'll then get the idea. 596 00:37:23,410 --> 00:37:26,260 But it's a matter of going back and forth between the 597 00:37:26,260 --> 00:37:28,770 gross structure and the actual equations and 598 00:37:28,770 --> 00:37:30,020 just working it out. 599 00:37:30,020 --> 00:37:37,090 600 00:37:37,090 --> 00:37:43,980 All right, so the key things in the sum-product algorithm. 601 00:37:43,980 --> 00:37:49,820 I've already explained some level to them. 602 00:37:49,820 --> 00:37:51,600 We have this sum-product update. 603 00:37:51,600 --> 00:37:55,400 604 00:37:55,400 --> 00:38:02,780 That basically explains given messages coming into a node, 605 00:38:02,780 --> 00:38:07,600 how do we compute the message going out of the node? 606 00:38:07,600 --> 00:38:15,090 We take contributions, products corresponding to what 607 00:38:15,090 --> 00:38:16,300 the code allows. 608 00:38:16,300 --> 00:38:18,970 We dump them into the appropriate bins in the output 609 00:38:18,970 --> 00:38:22,902 vector and that's the sum-product update rule. 610 00:38:22,902 --> 00:38:25,790 611 00:38:25,790 --> 00:38:30,063 We have the past future decomposition 612 00:38:30,063 --> 00:38:34,270 or combination rule. 613 00:38:34,270 --> 00:38:37,140 And that says at the end of the day, you're finally going 614 00:38:37,140 --> 00:38:40,380 to get messages going in both directions. 615 00:38:40,380 --> 00:38:45,200 And you just combine them component-wise as 616 00:38:45,200 --> 00:38:47,360 we started out with. 617 00:38:47,360 --> 00:38:51,414 And finally, we need to talk about the overall schedule of 618 00:38:51,414 --> 00:38:52,664 the algorithm. 619 00:38:52,664 --> 00:38:56,480 620 00:38:56,480 --> 00:39:01,413 So let me draw an arbitrary cycle-free graph. 621 00:39:01,413 --> 00:39:14,060 622 00:39:14,060 --> 00:39:18,540 OK, suppose I have an arbitrary cycle-free graph. 623 00:39:18,540 --> 00:39:23,290 It consists of external variables out here, internal 624 00:39:23,290 --> 00:39:28,240 variables and constraint nodes. 625 00:39:28,240 --> 00:39:34,470 How am I going to schedule these computations in order to 626 00:39:34,470 --> 00:39:38,970 do APP decoding of this graph? 627 00:39:38,970 --> 00:39:40,640 Well, what do I have at the beginning? 628 00:39:40,640 --> 00:39:44,380 At the beginning, I measure the received variables 629 00:39:44,380 --> 00:39:48,020 corresponding to each symbol, and that gives me the 630 00:39:48,020 --> 00:39:53,090 intrinsic information, which is incoming message you can 631 00:39:53,090 --> 00:39:56,710 consider from every symbol variable. 632 00:39:56,710 --> 00:39:58,540 Basically, what did I see on the channel? 633 00:39:58,540 --> 00:40:05,210 That's what that likelihood weight vector corresponds to. 634 00:40:05,210 --> 00:40:06,460 All right. 635 00:40:06,460 --> 00:40:09,640 636 00:40:09,640 --> 00:40:14,120 What do I need in order to do the sum-product update rule? 637 00:40:14,120 --> 00:40:15,870 I need the inputs. 638 00:40:15,870 --> 00:40:18,090 Think of this now as a directed graph. 639 00:40:18,090 --> 00:40:27,070 I need the incoming messages on all of the incident edges 640 00:40:27,070 --> 00:40:28,950 on this node except for one. 641 00:40:28,950 --> 00:40:32,580 If I have all but one, I can compute the last one as an 642 00:40:32,580 --> 00:40:35,120 outgoing message. 643 00:40:35,120 --> 00:40:39,340 So that's the structure of that update. 644 00:40:39,340 --> 00:40:42,250 So at this point, can I compute anything here? 645 00:40:42,250 --> 00:40:44,870 No, I would need the inputs. 646 00:40:44,870 --> 00:40:47,740 But I can compute this outgoing message. 647 00:40:47,740 --> 00:40:50,470 I can compute this outgoing message. 648 00:40:50,470 --> 00:40:53,200 I can compute this outgoing message. 649 00:40:53,200 --> 00:40:56,260 I haven't drawn these of very high degree. 650 00:40:56,260 --> 00:40:59,065 Normally we have degree higher than two. 651 00:40:59,065 --> 00:41:04,190 If I had another input here, however, I could still get 652 00:41:04,190 --> 00:41:05,440 that output. 653 00:41:05,440 --> 00:41:07,430 654 00:41:07,430 --> 00:41:11,270 All right, so that's time one. 655 00:41:11,270 --> 00:41:15,780 Think of there being a clock and at time one I can 656 00:41:15,780 --> 00:41:18,410 propagate messages into the graph to 657 00:41:18,410 --> 00:41:21,530 depth one if you like. 658 00:41:21,530 --> 00:41:25,260 There's going to be a systematic way that we can 659 00:41:25,260 --> 00:41:26,750 assign depths. 660 00:41:26,750 --> 00:41:30,100 All right, now at time two what can I compute? 661 00:41:30,100 --> 00:41:33,300 I can compute this guy because I have this input. 662 00:41:33,300 --> 00:41:36,600 663 00:41:36,600 --> 00:41:39,130 Now I have both inputs coming in here, I can 664 00:41:39,130 --> 00:41:41,460 compute this guy. 665 00:41:41,460 --> 00:41:44,610 Anything else? 666 00:41:44,610 --> 00:41:47,100 So maybe 0, 1, 2. 667 00:41:47,100 --> 00:41:48,970 I should put times on each of these, 668 00:41:48,970 --> 00:41:50,660 so I don't get confused. 669 00:41:50,660 --> 00:41:54,810 0, 0, 1, 0. 670 00:41:54,810 --> 00:41:56,750 So this is 2 at the output. 671 00:41:56,750 --> 00:42:00,300 672 00:42:00,300 --> 00:42:05,060 Now I'm at time three, what can I compute? 673 00:42:05,060 --> 00:42:10,210 At time three, I now have two inputs coming in here. 674 00:42:10,210 --> 00:42:13,910 I can compute this output. 675 00:42:13,910 --> 00:42:16,930 Actually, from these two inputs, I can compute this 676 00:42:16,930 --> 00:42:18,850 output at time three. 677 00:42:18,850 --> 00:42:21,910 And from these two, I can compute this outgoing message 678 00:42:21,910 --> 00:42:23,160 at time three. 679 00:42:23,160 --> 00:42:28,170 680 00:42:28,170 --> 00:42:29,880 Are there other things I can compute? 681 00:42:29,880 --> 00:42:31,380 Can I compute this? 682 00:42:31,380 --> 00:42:32,830 No, I don't have this yet. 683 00:42:32,830 --> 00:42:36,360 684 00:42:36,360 --> 00:42:37,370 That's probably it. 685 00:42:37,370 --> 00:42:39,410 Does anyone else see anything else I can compute at time 686 00:42:39,410 --> 00:42:44,050 three based on the time zero, one, and two messages? 687 00:42:44,050 --> 00:42:45,300 No. 688 00:42:45,300 --> 00:42:47,030 689 00:42:47,030 --> 00:42:49,530 All right, but I'm making progress. 690 00:42:49,530 --> 00:42:52,820 And should be clear that I'm always going to be able to 691 00:42:52,820 --> 00:42:53,960 make some progress. 692 00:42:53,960 --> 00:42:56,130 At time four I can compute this 693 00:42:56,130 --> 00:42:59,400 message in that direction. 694 00:42:59,400 --> 00:43:02,560 Let's see, I now have everything I need to compute 695 00:43:02,560 --> 00:43:05,940 this message in that direction. 696 00:43:05,940 --> 00:43:08,860 And I'm almost done. 697 00:43:08,860 --> 00:43:13,460 And finally, I can compute this message in this direction 698 00:43:13,460 --> 00:43:15,030 at time four, the extrinsic 699 00:43:15,030 --> 00:43:17,710 information for those variables. 700 00:43:17,710 --> 00:43:20,720 I guess I can do this at time four. 701 00:43:20,720 --> 00:43:24,150 I had that at time three. 702 00:43:24,150 --> 00:43:27,260 And now at time five, I can get 703 00:43:27,260 --> 00:43:30,086 everything else that I need. 704 00:43:30,086 --> 00:43:32,900 Time five I get that out. 705 00:43:32,900 --> 00:43:39,700 I get this out based on this input and this input. 706 00:43:39,700 --> 00:43:45,470 And I get this out based on this input and this input. 707 00:43:45,470 --> 00:43:49,560 So in a finite amount of time, in this case, five 708 00:43:49,560 --> 00:43:54,250 computational cycles, I can compute all of the messages in 709 00:43:54,250 --> 00:43:59,860 both directions for every variable in the graph. 710 00:43:59,860 --> 00:44:05,190 Now as a final cleanup step, I simply component-wise multiply 711 00:44:05,190 --> 00:44:07,490 the vectors in the two directions, and that gives me 712 00:44:07,490 --> 00:44:09,195 the APPs for all the variables. 713 00:44:09,195 --> 00:44:11,920 714 00:44:11,920 --> 00:44:15,330 This is a kind of parallel computation of all the APPs 715 00:44:15,330 --> 00:44:17,550 for all variables in the graph. 716 00:44:17,550 --> 00:44:20,090 717 00:44:20,090 --> 00:44:22,960 And I claim two things. 718 00:44:22,960 --> 00:44:25,910 It's finite and it's exact. 719 00:44:25,910 --> 00:44:30,820 And the exactness is proved from the equations and the 720 00:44:30,820 --> 00:44:31,845 cycle-free assumptions. 721 00:44:31,845 --> 00:44:35,560 So we can always decompose things as Cartesian products. 722 00:44:35,560 --> 00:44:39,340 Independent components. 723 00:44:39,340 --> 00:44:41,650 The finiteness, well, I assumed 724 00:44:41,650 --> 00:44:44,530 this is a finite graph. 725 00:44:44,530 --> 00:44:47,970 What is this number 5? 726 00:44:47,970 --> 00:44:51,710 It's really the maximum length of time it takes me to get 727 00:44:51,710 --> 00:44:58,160 from any of these external symbol variables 728 00:44:58,160 --> 00:44:59,760 to any other external-- 729 00:44:59,760 --> 00:45:02,900 from leaf to leaf in graph theory terms. 730 00:45:02,900 --> 00:45:04,850 And that's called the diameter of the graph. 731 00:45:04,850 --> 00:45:08,920 And again, some people count the last little symbol node 732 00:45:08,920 --> 00:45:11,670 and some people don't count the last little symbol node. 733 00:45:11,670 --> 00:45:14,420 But it's clear that a finite graph is going to have finite 734 00:45:14,420 --> 00:45:18,180 diameter and that the propagation time is basically 735 00:45:18,180 --> 00:45:19,830 going to be equal to the diameter. 736 00:45:19,830 --> 00:45:23,660 That's how long it takes to get across the graph. 737 00:45:23,660 --> 00:45:27,410 So each of these messages in each direction 738 00:45:27,410 --> 00:45:29,210 has a certain depth. 739 00:45:29,210 --> 00:45:33,380 That's the distance in the graph to the most 740 00:45:33,380 --> 00:45:37,620 distant leaf node. 741 00:45:37,620 --> 00:45:42,200 And the maximum sum of these is always going to 742 00:45:42,200 --> 00:45:44,320 be 5 in this case. 743 00:45:44,320 --> 00:45:46,110 Notice that it is. 744 00:45:46,110 --> 00:45:50,415 A tiny little graph theory theorem. 745 00:45:50,415 --> 00:45:51,665 AUDIENCE: [UNINTELLIGIBLE PHRASE] 746 00:45:51,665 --> 00:45:53,760 747 00:45:53,760 --> 00:45:59,534 you see why if you sum these messages flying through those 748 00:45:59,534 --> 00:46:03,285 past and future of the single edge, you would get the 749 00:46:03,285 --> 00:46:05,850 overall APP because you're summing the contribution from 750 00:46:05,850 --> 00:46:10,981 both the set of outputs and those [UNINTELLIGIBLE PHRASE]. 751 00:46:10,981 --> 00:46:15,762 How would you prove that that should be overall the APP of 752 00:46:15,762 --> 00:46:17,012 the symbols? 753 00:46:17,012 --> 00:46:19,640 754 00:46:19,640 --> 00:46:22,750 PROFESSOR: Recall the state space theorem. 755 00:46:22,750 --> 00:46:25,420 Let's take any edge in here. 756 00:46:25,420 --> 00:46:28,100 Suppose we think about cutting it. 757 00:46:28,100 --> 00:46:31,530 A cut-set through that edge. 758 00:46:31,530 --> 00:46:35,110 Suppose we consider agglomerating everything back 759 00:46:35,110 --> 00:46:37,800 here and calling that the past. 760 00:46:37,800 --> 00:46:39,940 Agglomerating everything out here and 761 00:46:39,940 --> 00:46:42,390 calling that the future. 762 00:46:42,390 --> 00:46:46,250 This is past and future. 763 00:46:46,250 --> 00:46:49,230 The idea of the state space theorem was that this is 764 00:46:49,230 --> 00:46:52,740 exactly the same situation as a two section trellis. 765 00:46:52,740 --> 00:46:56,610 766 00:46:56,610 --> 00:47:04,020 Because everything is linear you get the state space 767 00:47:04,020 --> 00:47:09,160 theorem basically, which says that you get a Cartesian 768 00:47:09,160 --> 00:47:11,700 product of a certain past, a certain future. 769 00:47:11,700 --> 00:47:14,500 You get a minimal state space, which is kind 770 00:47:14,500 --> 00:47:17,050 of the total code. 771 00:47:17,050 --> 00:47:20,080 The modulo, the part of the code that lives purely on the 772 00:47:20,080 --> 00:47:24,090 past, and the part that lives purely on the future. 773 00:47:24,090 --> 00:47:28,400 That quotient space is really what it is, corresponds to the 774 00:47:28,400 --> 00:47:31,870 state space at this time. 775 00:47:31,870 --> 00:47:35,920 Because every edge in a cycle-free graph is by itself 776 00:47:35,920 --> 00:47:40,720 a cut-set, you can do this for every edge in 777 00:47:40,720 --> 00:47:42,550 any cycle-free graph. 778 00:47:42,550 --> 00:47:44,000 You can make this same argument. 779 00:47:44,000 --> 00:47:47,350 So you get a well-defined minimal state space. 780 00:47:47,350 --> 00:47:51,220 You get a Cartesian product decomposition, which has a 781 00:47:51,220 --> 00:47:54,695 number of terms equal to the size of the state space, due 782 00:47:54,695 --> 00:47:58,260 to the dimension of the state space over binary field. 783 00:47:58,260 --> 00:48:00,130 And so everything goes through just as it did 784 00:48:00,130 --> 00:48:01,160 in the trellis case. 785 00:48:01,160 --> 00:48:04,700 That was the argument. 786 00:48:04,700 --> 00:48:07,910 So mushing all this together, you get 787 00:48:07,910 --> 00:48:11,260 the sum-product algorithm. 788 00:48:11,260 --> 00:48:15,240 This is the fundamental reason why I took the time to go 789 00:48:15,240 --> 00:48:22,390 through minimal trellis realizations of block codes. 790 00:48:22,390 --> 00:48:25,810 Not because that's actually the way we decode block codes 791 00:48:25,810 --> 00:48:28,880 all by themselves, or even use block codes. 792 00:48:28,880 --> 00:48:31,040 We use convolutional codes. 793 00:48:31,040 --> 00:48:33,680 It's so that we could prove these very fundamental 794 00:48:33,680 --> 00:48:36,450 theorems about graph decoding. 795 00:48:36,450 --> 00:48:40,090 And we aren't even going to decode cycle-free graphs. 796 00:48:40,090 --> 00:48:42,270 But that's the intellectual line here. 797 00:48:42,270 --> 00:48:48,490 798 00:48:48,490 --> 00:48:52,140 Thank you for that question, it was very well timed. 799 00:48:52,140 --> 00:48:53,390 AUDIENCE: [UNINTELLIGIBLE PHRASE] 800 00:48:53,390 --> 00:49:00,450 801 00:49:00,450 --> 00:49:03,140 PROFESSOR: Let's take a look at the structure of this. 802 00:49:03,140 --> 00:49:05,990 Where do the computations occur? 803 00:49:05,990 --> 00:49:09,060 Every node there's a computation. 804 00:49:09,060 --> 00:49:12,910 What's the complexity of that computation? 805 00:49:12,910 --> 00:49:17,100 It's somehow approximately equal to the size of the 806 00:49:17,100 --> 00:49:17,950 constraint code. 807 00:49:17,950 --> 00:49:21,220 We do one thing for every possible combination of 808 00:49:21,220 --> 00:49:24,410 legitimate combination of variables, which is what we 809 00:49:24,410 --> 00:49:26,710 mean by the size of the constraint code. 810 00:49:26,710 --> 00:49:30,780 So if this is a 6, 3 code, we need to do eight things here. 811 00:49:30,780 --> 00:49:35,070 Basically, eight multiplications. 812 00:49:35,070 --> 00:49:39,540 So this, in more general cycle-free graphs, that's what 813 00:49:39,540 --> 00:49:45,380 corresponds to branch complexity and trellis graphs. 814 00:49:45,380 --> 00:49:48,940 We found that the branch space corresponded to a constraint 815 00:49:48,940 --> 00:49:51,290 code in trellises. 816 00:49:51,290 --> 00:49:55,470 So the computation is basically determined by the 817 00:49:55,470 --> 00:49:58,970 constraint code complexity and really, overall what we'd like 818 00:49:58,970 --> 00:50:03,410 to do is minimize the constraint code, the maximum 819 00:50:03,410 --> 00:50:05,740 size of any constraint code because that's going to 820 00:50:05,740 --> 00:50:09,190 dominate computation. 821 00:50:09,190 --> 00:50:12,200 So we try to keep the k in these 822 00:50:12,200 --> 00:50:15,690 things as small as possible. 823 00:50:15,690 --> 00:50:16,070 Yes? 824 00:50:16,070 --> 00:50:17,320 AUDIENCE: [UNINTELLIGIBLE PHRASE] 825 00:50:17,320 --> 00:50:21,650 826 00:50:21,650 --> 00:50:26,901 and you get larger dimension constraint for 827 00:50:26,901 --> 00:50:28,151 [UNINTELLIGIBLE PHRASE]. 828 00:50:28,151 --> 00:50:31,691 829 00:50:31,691 --> 00:50:33,890 PROFESSOR: Yeah, we're actually not too concerned 830 00:50:33,890 --> 00:50:35,140 about diameter. 831 00:50:35,140 --> 00:50:37,150 832 00:50:37,150 --> 00:50:39,750 Because things go up exponentially with constraint 833 00:50:39,750 --> 00:50:43,310 code dimension, just that single parameter tends to be 834 00:50:43,310 --> 00:50:46,230 the thing we want to focus on. 835 00:50:46,230 --> 00:50:50,160 But we remember that's hard to minimize in a trellis. 836 00:50:50,160 --> 00:50:55,240 We showed that basically our only trick there after 837 00:50:55,240 --> 00:50:58,990 ordering the coordinates was sectionalization and that we 838 00:50:58,990 --> 00:51:02,070 could not reduce the constraint code complexity, 839 00:51:02,070 --> 00:51:06,150 the branch complexity, by sectionalization. 840 00:51:06,150 --> 00:51:08,800 And then again, we build on that and we went through this 841 00:51:08,800 --> 00:51:13,460 argument that said, gee, this corresponds to some trellis. 842 00:51:13,460 --> 00:51:17,130 We could draw a trellis where this, literally was the past 843 00:51:17,130 --> 00:51:18,650 and this was the future. 844 00:51:18,650 --> 00:51:21,990 And the minimal size state space in this cycle-free graph 845 00:51:21,990 --> 00:51:24,165 would be the same as in that trellis. 846 00:51:24,165 --> 00:51:26,710 847 00:51:26,710 --> 00:51:31,500 What I guess I didn't show is you can't reduce the 848 00:51:31,500 --> 00:51:37,320 constraint code complexity by significantly -- 849 00:51:37,320 --> 00:51:39,730 actually, I'm not sure there's a crisp theorem. 850 00:51:39,730 --> 00:51:44,620 But the fact that you now are constrained to size of state 851 00:51:44,620 --> 00:51:48,260 space, this is certainly going to be at least as large as 852 00:51:48,260 --> 00:51:49,610 state space sizes. 853 00:51:49,610 --> 00:51:52,580 Tends to indicate that you're not going to be able to do a 854 00:51:52,580 --> 00:51:56,520 lot about constraint code complexity by going to more 855 00:51:56,520 --> 00:51:59,820 elaborate cycle-free graphs than just the chain trellis 856 00:51:59,820 --> 00:52:01,070 graph either. 857 00:52:01,070 --> 00:52:03,610 858 00:52:03,610 --> 00:52:05,902 That's, I think, a missing theorem that 859 00:52:05,902 --> 00:52:08,530 would be nice to have. 860 00:52:08,530 --> 00:52:12,380 I had a paper about a year or two ago, I'm not sure it ever 861 00:52:12,380 --> 00:52:13,630 quite gets to that theorem. 862 00:52:13,630 --> 00:52:15,450 But that's the theorem you'd like to see. 863 00:52:15,450 --> 00:52:19,830 You really can't improve this. 864 00:52:19,830 --> 00:52:23,690 So ultimately, we're bound by what you can do in a trellis. 865 00:52:23,690 --> 00:52:27,520 And in a trellis you can't do too much about this branch 866 00:52:27,520 --> 00:52:30,690 complexity parameter once you've ordered the 867 00:52:30,690 --> 00:52:31,940 coordinates. 868 00:52:31,940 --> 00:52:35,320 869 00:52:35,320 --> 00:52:37,810 A bit of hand-waving here, but we're going to need to go away 870 00:52:37,810 --> 00:52:39,060 from cycle-free graphs. 871 00:52:39,060 --> 00:52:41,430 872 00:52:41,430 --> 00:52:44,720 One reason I like normal graphs is that you get this 873 00:52:44,720 --> 00:52:47,870 very nice separation of function. 874 00:52:47,870 --> 00:52:50,220 You get the idea that nodes 875 00:52:50,220 --> 00:52:52,380 correspond to little computers. 876 00:52:52,380 --> 00:52:55,650 You could actually realize this with little computers. 877 00:52:55,650 --> 00:52:57,410 Put a computer in here for each node. 878 00:52:57,410 --> 00:53:02,410 Its job is to do the sum-product update equation. 879 00:53:02,410 --> 00:53:05,880 And then, well, what are the edges? 880 00:53:05,880 --> 00:53:08,610 They're for communication. 881 00:53:08,610 --> 00:53:13,450 These are the wires that run around on your chip. 882 00:53:13,450 --> 00:53:16,280 And so when we talk about state space complexity, we're 883 00:53:16,280 --> 00:53:18,310 really talking about something like bandwidth. 884 00:53:18,310 --> 00:53:21,170 How wide do you have to make these wires? 885 00:53:21,170 --> 00:53:24,410 Do you have 6 bits, or 9 bits, or whatever? 886 00:53:24,410 --> 00:53:26,990 So state space really has to do with communication 887 00:53:26,990 --> 00:53:28,440 complexity. 888 00:53:28,440 --> 00:53:31,840 Constraint or branch has to do with the computational 889 00:53:31,840 --> 00:53:36,910 complexity, which sometimes one's more 890 00:53:36,910 --> 00:53:37,690 important than the other. 891 00:53:37,690 --> 00:53:40,100 Computational tends to be the thing we focus on. 892 00:53:40,100 --> 00:53:43,150 But we know more and more as we get to more complicated 893 00:53:43,150 --> 00:53:45,890 chips, communications complexity 894 00:53:45,890 --> 00:53:50,410 tends to become dominant. 895 00:53:50,410 --> 00:53:55,780 And this, of course, is the I/O. These are your paths 896 00:53:55,780 --> 00:53:59,230 which go to the outside world, your I/O paths. 897 00:53:59,230 --> 00:54:01,650 So it's really a good way to think about this. 898 00:54:01,650 --> 00:54:04,190 And in fact, some people like Andy [UNINTELLIGIBLE] 899 00:54:04,190 --> 00:54:08,290 and his students have experimented with building 900 00:54:08,290 --> 00:54:10,095 analog sum-product decoders. 901 00:54:10,095 --> 00:54:13,650 902 00:54:13,650 --> 00:54:17,420 The general idea here is that you can compute an output as 903 00:54:17,420 --> 00:54:21,510 soon as you have the two corresponding inputs. 904 00:54:21,510 --> 00:54:25,910 Well, imagine another schedule where this thing just computes 905 00:54:25,910 --> 00:54:28,725 all the time based on whatever inputs it has. 906 00:54:28,725 --> 00:54:31,680 907 00:54:31,680 --> 00:54:36,500 In a cycle-free graph, say you're in the interior. 908 00:54:36,500 --> 00:54:40,420 Initially, what you have on your inputs is garbage. 909 00:54:40,420 --> 00:54:42,930 But eventually, you're going to get the right things on all 910 00:54:42,930 --> 00:54:43,580 your inputs. 911 00:54:43,580 --> 00:54:45,430 And therefore, you're going to put the right things on all 912 00:54:45,430 --> 00:54:46,810 your outputs. 913 00:54:46,810 --> 00:54:51,050 So again, up to some propagation times, just 914 00:54:51,050 --> 00:54:55,260 putting in non-clocked little computers here, whose function 915 00:54:55,260 --> 00:54:58,720 is to generate the right outputs given the inputs in 916 00:54:58,720 --> 00:55:02,480 each direction, is eventually going to be right in a 917 00:55:02,480 --> 00:55:05,330 cycle-free graph. 918 00:55:05,330 --> 00:55:08,180 And in fact, there are very nice little analog 919 00:55:08,180 --> 00:55:12,300 implementations of this where it turns out the sum-product 920 00:55:12,300 --> 00:55:15,700 update rule is extremely amenable to transistor 921 00:55:15,700 --> 00:55:20,540 implementation nonlinear rule. 922 00:55:20,540 --> 00:55:26,590 So you just put these in, little computing nodes, and 923 00:55:26,590 --> 00:55:30,090 you put whatever you received on here as the received 924 00:55:30,090 --> 00:55:31,900 symbols, and you let it go. 925 00:55:31,900 --> 00:55:36,860 And it computes incredibly quickly for 926 00:55:36,860 --> 00:55:38,260 a cycle-free graph. 927 00:55:38,260 --> 00:55:41,451 928 00:55:41,451 --> 00:55:47,160 Where we're going to go is we're going to say, well, gee, 929 00:55:47,160 --> 00:55:49,790 we got this nice little local rule that we can implement 930 00:55:49,790 --> 00:55:53,160 with local computers for the sum-product algorithm. 931 00:55:53,160 --> 00:55:57,160 Maybe it'll work if we do it on a graph with cycles. 932 00:55:57,160 --> 00:56:00,240 And on a graph with cycles this kind of parallel or 933 00:56:00,240 --> 00:56:03,590 flooding schedule, where every little computer is computing 934 00:56:03,590 --> 00:56:07,520 something at every instant of time, is probably the most 935 00:56:07,520 --> 00:56:09,140 popular schedule. 936 00:56:09,140 --> 00:56:12,366 And again, you build an analog implementation of that and it 937 00:56:12,366 --> 00:56:16,270 goes extremely fast and it converges to something. 938 00:56:16,270 --> 00:56:20,740 It just relaxes to something, there's a local equilibrium in 939 00:56:20,740 --> 00:56:23,080 all these constraint nodes. 940 00:56:23,080 --> 00:56:27,880 And therefore, some global APP vector that satisfies the 941 00:56:27,880 --> 00:56:29,130 whole thing. 942 00:56:29,130 --> 00:56:34,640 943 00:56:34,640 --> 00:56:37,820 I think this is a nice attribute of this kind of 944 00:56:37,820 --> 00:56:40,670 graphical realization. 945 00:56:40,670 --> 00:56:45,720 All right, a particular example of how 946 00:56:45,720 --> 00:56:49,010 this schedule works. 947 00:56:49,010 --> 00:56:51,570 The cycle-free realization that we care most 948 00:56:51,570 --> 00:56:55,050 about is the trellis. 949 00:56:55,050 --> 00:56:59,920 The BCJR algorithm is simply the implementation of the 950 00:56:59,920 --> 00:57:03,616 sum-product algorithm on a trellis. 951 00:57:03,616 --> 00:57:07,310 SP on a trellis. 952 00:57:07,310 --> 00:57:13,010 So generically, what does a trellis look like? 953 00:57:13,010 --> 00:57:18,610 I guess I should allow for some nontrivial code here. 954 00:57:18,610 --> 00:57:21,250 So all trellises have this same boring graph. 955 00:57:21,250 --> 00:57:33,330 956 00:57:33,330 --> 00:57:35,750 What is this schedule? 957 00:57:35,750 --> 00:57:39,350 It's a cycle-free graph, so we could operate according to 958 00:57:39,350 --> 00:57:44,670 this nice controlled schedule, compute everything just once. 959 00:57:44,670 --> 00:57:46,040 And how does that operate? 960 00:57:46,040 --> 00:57:51,320 We get intrinsic information that's measured at each of 961 00:57:51,320 --> 00:57:53,290 these symbols here. 962 00:57:53,290 --> 00:57:56,400 What did we receive on the channel? 963 00:57:56,400 --> 00:58:02,280 We put that through a constraint code and we get 964 00:58:02,280 --> 00:58:04,450 here, let's give this a name. 965 00:58:04,450 --> 00:58:06,870 This is called intrinsic 0. 966 00:58:06,870 --> 00:58:13,440 Here is alpha 1, which is just a function of the intrinsic 967 00:58:13,440 --> 00:58:15,350 information at time zero. 968 00:58:15,350 --> 00:58:17,960 And we combine that with the intrinsic information at time 969 00:58:17,960 --> 00:58:22,460 one and we get alpha 2, and so forth. 970 00:58:22,460 --> 00:58:26,810 So we get a forward-going path down the trellis. 971 00:58:26,810 --> 00:58:31,170 If any of you knows what a Kalman filter or a smoother 972 00:58:31,170 --> 00:58:34,310 does, this is really exactly the same thing. 973 00:58:34,310 --> 00:58:39,690 This is finding the conditional APP probabilities 974 00:58:39,690 --> 00:58:42,600 of this state vector given everything in the past. 975 00:58:42,600 --> 00:58:45,335 976 00:58:45,335 --> 00:58:47,760 In Kalman filtering or smoothing, it's done for 977 00:58:47,760 --> 00:58:48,830 Gaussian vectors. 978 00:58:48,830 --> 00:58:51,480 Here we're talking about discrete vectors. 979 00:58:51,480 --> 00:58:54,630 But in principle, it's doing exactly the same thing. 980 00:58:54,630 --> 00:58:57,880 It's computing conditional probabilities given everything 981 00:58:57,880 --> 00:58:59,810 observed in the past. 982 00:58:59,810 --> 00:59:02,830 And so the diameter here is just the length 983 00:59:02,830 --> 00:59:04,080 of the trellis basically. 984 00:59:04,080 --> 00:59:06,580 985 00:59:06,580 --> 00:59:10,740 So this is i n minus 1, i n. 986 00:59:10,740 --> 00:59:14,040 And here we have alpha n. 987 00:59:14,040 --> 00:59:18,210 988 00:59:18,210 --> 00:59:21,490 When we have alpha n finally in here, we can compute the 989 00:59:21,490 --> 00:59:27,330 extrinsic information based on alpha n as just extrinsic n. 990 00:59:27,330 --> 00:59:33,310 We combine these two and we're done for that guy. 991 00:59:33,310 --> 00:59:38,600 Meanwhile, we have to compute the backward-going information 992 00:59:38,600 --> 00:59:40,270 on each of these branches. 993 00:59:40,270 --> 00:59:44,370 So we compute first b n based on i n. 994 00:59:44,370 --> 00:59:47,910 Then we get b n minus 1 based on i n minus 1 995 00:59:47,910 --> 00:59:50,910 and b n, and so forth. 996 00:59:50,910 --> 00:59:56,650 We get a backward-going propagation of conditional 997 00:59:56,650 --> 00:59:57,330 information. 998 00:59:57,330 --> 01:00:01,300 Each of these betas represents the a-posteriori probability 999 01:00:01,300 --> 01:00:04,640 vector given the future from it. 1000 01:00:04,640 --> 01:00:07,480 And finally, when we get down to here we can compute 1001 01:00:07,480 --> 01:00:12,850 extrinsic 0 from data 1. 1002 01:00:12,850 --> 01:00:16,800 And by this time, when we have both alpha 1 and beta 2 coming 1003 01:00:16,800 --> 01:00:21,100 into here, we can, of course, get extrinsic 1. 1004 01:00:21,100 --> 01:00:23,850 When we get both the alpha and beta coming in here, we can 1005 01:00:23,850 --> 01:00:26,380 get extrinsic 2, and so forth. 1006 01:00:26,380 --> 01:00:32,800 1007 01:00:32,800 --> 01:00:36,292 Of course, you can write this down as equations. 1008 01:00:36,292 --> 01:00:39,060 If you want to read the equations, you look in the 1009 01:00:39,060 --> 01:00:47,190 original Bahl, Cocke, Jelinek, and Raviv article in 1973. 1010 01:00:47,190 --> 01:00:54,300 This was again, a kind of lost paper because just for 1011 01:00:54,300 --> 01:00:57,180 decoding a trellis, the Viterbi algorithm 1012 01:00:57,180 --> 01:01:00,640 is much less complex. 1013 01:01:00,640 --> 01:01:05,240 And gives you approximately the same thing. 1014 01:01:05,240 --> 01:01:09,590 I mean, I talked last time how probably what you really want 1015 01:01:09,590 --> 01:01:12,680 to do in decoding a block code is maximum likelihood. 1016 01:01:12,680 --> 01:01:16,090 That's minimum probability of error in a block-wise basis. 1017 01:01:16,090 --> 01:01:18,910 1018 01:01:18,910 --> 01:01:22,650 What APP decoding gives you minimum probability of error. 1019 01:01:22,650 --> 01:01:24,760 If you then make a decision based on each of these 1020 01:01:24,760 --> 01:01:29,620 extrinsic information, or on the combination of these two 1021 01:01:29,620 --> 01:01:33,000 to get the overall APP vector, that's called a map algorithm, 1022 01:01:33,000 --> 01:01:35,600 maximum a-posteriori probability. 1023 01:01:35,600 --> 01:01:37,460 If you make a decision based on that, that will give you 1024 01:01:37,460 --> 01:01:39,650 the minimal bit error probability, but it may not 1025 01:01:39,650 --> 01:01:42,840 necessarily give you a code word, and it will not minimize 1026 01:01:42,840 --> 01:01:45,250 the probability of making any error, which is generally what 1027 01:01:45,250 --> 01:01:47,690 you want to minimize. 1028 01:01:47,690 --> 01:01:51,180 So for trellis decoding of block codes 1029 01:01:51,180 --> 01:01:53,410 this was not favored. 1030 01:01:53,410 --> 01:01:57,640 However, a block code as a component of a larger code, 1031 01:01:57,640 --> 01:02:02,630 you want the APP vector to feed off to something else. 1032 01:02:02,630 --> 01:02:10,350 And so with the advent of much larger codes, of which block 1033 01:02:10,350 --> 01:02:14,040 codes are components or convolutional codes, BCJR is 1034 01:02:14,040 --> 01:02:15,800 the way you want to go. 1035 01:02:15,800 --> 01:02:18,660 And some people say it's about three times as complicated as 1036 01:02:18,660 --> 01:02:19,910 the Viterbi algorithm. 1037 01:02:19,910 --> 01:02:22,450 1038 01:02:22,450 --> 01:02:25,730 All right, so you'll have an exercise with doing that in 1039 01:02:25,730 --> 01:02:26,980 the homework. 1040 01:02:26,980 --> 01:02:29,280 1041 01:02:29,280 --> 01:02:33,880 There is a closely related algorithm called the min-sum 1042 01:02:33,880 --> 01:02:38,420 algorithm, or the max-product algorithm. 1043 01:02:38,420 --> 01:02:45,880 And the idea of that is that you can really carry through 1044 01:02:45,880 --> 01:02:52,010 the same logical computations except when you get to one of 1045 01:02:52,010 --> 01:02:54,720 these combining operations, rather than doing a sum of 1046 01:02:54,720 --> 01:03:00,750 products, you can do a max of products. 1047 01:03:00,750 --> 01:03:04,540 And that the Cartesian product law still holds, the same kind 1048 01:03:04,540 --> 01:03:07,950 of decompositions that we have still hold. 1049 01:03:07,950 --> 01:03:21,330 For instance, in this example, to get the state 0, 0, 0 1050 01:03:21,330 --> 01:03:24,990 before what we said we're going to get an APP vector 1051 01:03:24,990 --> 01:03:36,490 where this is the past APP for the 0, 0 value of the middle 1052 01:03:36,490 --> 01:03:37,270 state vector. 1053 01:03:37,270 --> 01:03:42,950 We simply combine the likelihood weights of the two 1054 01:03:42,950 --> 01:03:46,660 possible ways of getting to that state and we sum them. 1055 01:03:46,660 --> 01:03:49,270 1056 01:03:49,270 --> 01:03:52,575 Instead of doing sum, let's just take max. 1057 01:03:52,575 --> 01:03:55,470 1058 01:03:55,470 --> 01:03:59,044 The maximum probability of getting-- 1059 01:03:59,044 --> 01:04:02,340 in other words, what's the best way of getting here? 1060 01:04:02,340 --> 01:04:04,180 Same question as we ask in the Viterbi algorithm. 1061 01:04:04,180 --> 01:04:08,520 1062 01:04:08,520 --> 01:04:13,610 We'll let that be the likelihood weight for the 1063 01:04:13,610 --> 01:04:14,420 state vector. 1064 01:04:14,420 --> 01:04:16,870 Otherwise, the logic is exactly the same. 1065 01:04:16,870 --> 01:04:22,040 Similarly, coming this way, what's the max of these two? 1066 01:04:22,040 --> 01:04:24,930 OK, what's the best way of getting to the state vector in 1067 01:04:24,930 --> 01:04:27,970 terms of likelihood from the future? 1068 01:04:27,970 --> 01:04:31,880 And what do you know, if we combine these things at this 1069 01:04:31,880 --> 01:04:36,130 point, we now have a past vector and a future vector. 1070 01:04:36,130 --> 01:04:42,890 Let's take the max of this times the max of that. 1071 01:04:42,890 --> 01:04:45,490 That gives us the maximum likelihood way of going 1072 01:04:45,490 --> 01:04:48,175 through the 0, 0 value of the state. 1073 01:04:48,175 --> 01:04:54,720 1074 01:04:54,720 --> 01:04:58,370 If we do exactly the same logic for decoding this 1075 01:04:58,370 --> 01:05:01,220 trellis, we get a two-way algorithm. 1076 01:05:01,220 --> 01:05:02,060 It's like this. 1077 01:05:02,060 --> 01:05:07,350 It's like the BCJR algorithm, where at every point we're 1078 01:05:07,350 --> 01:05:11,060 computing the maximum likelihood. 1079 01:05:11,060 --> 01:05:15,020 It's not hard to show that what this gives you is that 1080 01:05:15,020 --> 01:05:17,640 each of these symbols out here, it gives you the symbol 1081 01:05:17,640 --> 01:05:20,660 that belongs to the maximum likelihood code word. 1082 01:05:20,660 --> 01:05:24,990 1083 01:05:24,990 --> 01:05:27,075 How does it differ from the Viterbi algorithm? 1084 01:05:27,075 --> 01:05:28,800 It gives you the same answer. 1085 01:05:28,800 --> 01:05:32,550 It gives you the symbols, one by one, of the maximum 1086 01:05:32,550 --> 01:05:35,640 likelihood code word. 1087 01:05:35,640 --> 01:05:37,760 It doesn't have to remember anything. 1088 01:05:37,760 --> 01:05:41,210 It doesn't store survivors. 1089 01:05:41,210 --> 01:05:42,520 That's its advantage. 1090 01:05:42,520 --> 01:05:45,220 It's disadvantage is it's a two-way algorithm. 1091 01:05:45,220 --> 01:05:48,320 You have to start from both ends and propagate your 1092 01:05:48,320 --> 01:05:51,310 information in the same way. 1093 01:05:51,310 --> 01:05:53,640 And for block codes, maybe that's OK. 1094 01:05:53,640 --> 01:05:55,870 For convolutional codes that's certainly not something you 1095 01:05:55,870 --> 01:05:59,770 want to do because the code word may go on indefinitely. 1096 01:05:59,770 --> 01:06:05,350 So the Viterbi algorithm gets rid of the backward step by 1097 01:06:05,350 --> 01:06:07,785 always remembering the survivor. 1098 01:06:07,785 --> 01:06:12,310 It remembers the history of how it got to where it is. 1099 01:06:12,310 --> 01:06:18,100 So that once you get the max, you don't have to-- 1100 01:06:18,100 --> 01:06:20,870 you simply read it out rather than having-- 1101 01:06:20,870 --> 01:06:27,270 this backward part amounts to a trace back operation. 1102 01:06:27,270 --> 01:06:30,980 That's probably much too quick to be absorbed. 1103 01:06:30,980 --> 01:06:36,460 But I just wanted to mention there is a max product 1104 01:06:36,460 --> 01:06:41,230 version, which if you take log likelihoods becomes a max-sum. 1105 01:06:41,230 --> 01:06:42,480 Or if you take negative log 1106 01:06:42,480 --> 01:06:44,220 likelihoods it becomes min-sum. 1107 01:06:44,220 --> 01:06:46,760 So this is called the min-sum algorithm. 1108 01:06:46,760 --> 01:06:51,030 And it does maximum likelihood decoding 1109 01:06:51,030 --> 01:06:52,280 rather than APP decoding. 1110 01:06:52,280 --> 01:06:54,690 1111 01:06:54,690 --> 01:06:59,000 And sometimes it's used as an approximation or very 1112 01:06:59,000 --> 01:07:04,378 intermediate approximations between these two. 1113 01:07:04,378 --> 01:07:05,628 AUDIENCE: [UNINTELLIGIBLE PHRASE] 1114 01:07:05,628 --> 01:07:11,100 1115 01:07:11,100 --> 01:07:16,830 PROFESSOR: No, if you use the min-sum, basically everybody 1116 01:07:16,830 --> 01:07:20,980 will converge on the same maximum likelihood code word. 1117 01:07:20,980 --> 01:07:24,790 So all these constraints will be consistent with-- 1118 01:07:24,790 --> 01:07:27,110 you'd be taking a single max at each point. 1119 01:07:27,110 --> 01:07:30,130 But at each point you will solve for the maximum 1120 01:07:30,130 --> 01:07:31,150 likelihood code word. 1121 01:07:31,150 --> 01:07:34,530 And what you'll see up here is the symbol that belongs to the 1122 01:07:34,530 --> 01:07:36,330 maximum likelihood code word. 1123 01:07:36,330 --> 01:07:39,710 This is maybe a bit of a miracle when you first see it, 1124 01:07:39,710 --> 01:07:44,060 but you'll independently get each of the bits in the 1125 01:07:44,060 --> 01:07:46,770 maximum likelihood code word. 1126 01:07:46,770 --> 01:07:47,560 Check it out at home. 1127 01:07:47,560 --> 01:07:51,430 AUDIENCE: Why not always do this? 1128 01:07:51,430 --> 01:07:54,080 PROFESSOR: Why not always do that? 1129 01:07:54,080 --> 01:08:00,450 It turns out that we want softer decisions when this is 1130 01:08:00,450 --> 01:08:03,230 part of a big graph. 1131 01:08:03,230 --> 01:08:07,820 We actually want the APPs, the relative probabilities of each 1132 01:08:07,820 --> 01:08:10,010 of the values. 1133 01:08:10,010 --> 01:08:13,710 The max, in effect, is making a hard decision. 1134 01:08:13,710 --> 01:08:16,560 AUDIENCE: [UNINTELLIGIBLE PHRASE] 1135 01:08:16,560 --> 01:08:17,149 PROFESSOR: Yeah, all right. 1136 01:08:17,149 --> 01:08:18,819 It does give reliability information. 1137 01:08:18,819 --> 01:08:23,340 1138 01:08:23,340 --> 01:08:25,620 I'm not sure I can give you a conclusive answer to that 1139 01:08:25,620 --> 01:08:28,390 question except APP works better. 1140 01:08:28,390 --> 01:08:31,569 That's a more empirical answer. 1141 01:08:31,569 --> 01:08:32,934 Yes? 1142 01:08:32,934 --> 01:08:34,184 AUDIENCE: [UNINTELLIGIBLE PHRASE] 1143 01:08:34,184 --> 01:08:42,089 1144 01:08:42,089 --> 01:08:46,520 Where does that flexibility of the input symbols being 1145 01:08:46,520 --> 01:08:47,770 [UNINTELLIGIBLE PHRASE]. 1146 01:08:47,770 --> 01:08:52,259 1147 01:08:52,259 --> 01:08:55,745 We should be able to take that into account 1148 01:08:55,745 --> 01:08:56,995 [UNINTELLIGIBLE PHRASE]. 1149 01:08:56,995 --> 01:09:04,979 1150 01:09:04,979 --> 01:09:06,374 PROFESSOR: Yeah. 1151 01:09:06,374 --> 01:09:18,090 OK, well, if the input symbols are independently not 1152 01:09:18,090 --> 01:09:26,819 equiprobable, you can feed that in as-- 1153 01:09:26,819 --> 01:09:31,569 you know, once they came out of the channel, once we see 1154 01:09:31,569 --> 01:09:35,399 the channel information, the two symbols are not 1155 01:09:35,399 --> 01:09:37,170 equiprobable anymore. 1156 01:09:37,170 --> 01:09:43,220 So if somehow they a priori independently were not equally 1157 01:09:43,220 --> 01:09:45,689 probable, you could just feed that in as part of this 1158 01:09:45,689 --> 01:09:48,880 intrinsic information vector. 1159 01:09:48,880 --> 01:09:54,720 That's almost never the case in coding. 1160 01:09:54,720 --> 01:09:58,970 The intrinsic information is sometimes regarded as a-priori 1161 01:09:58,970 --> 01:10:04,360 and this has a-posteriori, or the combination of the two of 1162 01:10:04,360 --> 01:10:08,690 them is then a-posteriori when this goes off somewhere else. 1163 01:10:08,690 --> 01:10:11,510 1164 01:10:11,510 --> 01:10:17,540 But it doesn't really mean that the symbol itself was-- 1165 01:10:17,540 --> 01:10:22,170 it means that the evidence is biased one way or the other. 1166 01:10:22,170 --> 01:10:25,980 And when this appears somewhere else in the graph, 1167 01:10:25,980 --> 01:10:29,860 this will now appear somewhere down in some other graph. 1168 01:10:29,860 --> 01:10:32,660 It's called the a-priori information, but what it is is 1169 01:10:32,660 --> 01:10:35,920 the a-posteriori information given all of the received 1170 01:10:35,920 --> 01:10:39,018 symbols in this part of the graph. 1171 01:10:39,018 --> 01:10:40,268 AUDIENCE: [UNINTELLIGIBLE PHRASE] 1172 01:10:40,268 --> 01:10:43,730 1173 01:10:43,730 --> 01:10:49,320 Does the i0 of that other graph contain this table 1174 01:10:49,320 --> 01:10:53,140 sharing information added to it? 1175 01:10:53,140 --> 01:10:56,720 PROFESSOR: Yeah, the i0 for another graph is just the e0 1176 01:10:56,720 --> 01:10:57,970 of this graph. 1177 01:10:57,970 --> 01:11:01,710 1178 01:11:01,710 --> 01:11:03,970 I mean, if you consider it all as one big graph, 1179 01:11:03,970 --> 01:11:04,740 that's what you want. 1180 01:11:04,740 --> 01:11:07,560 You want the message that goes in this direction, which 1181 01:11:07,560 --> 01:11:11,210 basically summarizes all of the inputs from this graph. 1182 01:11:11,210 --> 01:11:14,970 1183 01:11:14,970 --> 01:11:18,460 Generally, you'd have a little equals node here. 1184 01:11:18,460 --> 01:11:23,160 You'd have the actual intrinsic information coming 1185 01:11:23,160 --> 01:11:25,100 from the outside world there. 1186 01:11:25,100 --> 01:11:28,520 You would compute the e0 from this graph. 1187 01:11:28,520 --> 01:11:32,975 And at this point, you would combine e0 and i0, and that 1188 01:11:32,975 --> 01:11:34,410 would go around here. 1189 01:11:34,410 --> 01:11:40,400 1190 01:11:40,400 --> 01:11:45,460 So the very last page of chapter 12 just says, OK, 1191 01:11:45,460 --> 01:11:48,420 we've got a nice, clean, finite exact algorithm that 1192 01:11:48,420 --> 01:11:52,120 will compute all these APPs on a cycle-free graph. 1193 01:11:52,120 --> 01:11:54,720 Now, suppose we're faced with a graph that has cycles. 1194 01:11:54,720 --> 01:11:57,650 1195 01:11:57,650 --> 01:11:59,490 And here's the situation. 1196 01:11:59,490 --> 01:12:01,405 Suppose the graph has cycles in it. 1197 01:12:01,405 --> 01:12:04,220 1198 01:12:04,220 --> 01:12:10,290 Then first thing that might occur to you, at least after 1199 01:12:10,290 --> 01:12:14,390 this development is, well, we now have a local rule for 1200 01:12:14,390 --> 01:12:18,220 updating at each of these computational nodes, each of 1201 01:12:18,220 --> 01:12:21,280 these constraint nodes, in this representation. 1202 01:12:21,280 --> 01:12:24,150 1203 01:12:24,150 --> 01:12:26,960 Why don't we just let it rip? 1204 01:12:26,960 --> 01:12:28,210 See what happens. 1205 01:12:28,210 --> 01:12:30,640 1206 01:12:30,640 --> 01:12:34,630 We're going to put in some intrinsic information at the 1207 01:12:34,630 --> 01:12:37,360 outputs as before. 1208 01:12:37,360 --> 01:12:42,490 We can compute some output message here based on-- well, 1209 01:12:42,490 --> 01:12:47,420 eventually we're going to get some input message from here. 1210 01:12:47,420 --> 01:12:50,520 And we can compute some output message here based on all of 1211 01:12:50,520 --> 01:12:55,940 these and we'll pass that over. 1212 01:12:55,940 --> 01:12:59,760 Let's assume what's called a parallel schedule or a 1213 01:12:59,760 --> 01:13:03,770 flooding schedule, where in each cycle every one of these 1214 01:13:03,770 --> 01:13:05,210 little guys does its thing. 1215 01:13:05,210 --> 01:13:09,740 It takes all the inputs that it has available at that time 1216 01:13:09,740 --> 01:13:10,990 and it generates outputs. 1217 01:13:10,990 --> 01:13:14,460 1218 01:13:14,460 --> 01:13:19,160 Well, hope for the best. 1219 01:13:19,160 --> 01:13:21,540 It will compute something. 1220 01:13:21,540 --> 01:13:26,340 You can sort of intuitively see that it'll settle. 1221 01:13:26,340 --> 01:13:31,060 It'll converge, perhaps, into some kind of equilibrium. 1222 01:13:31,060 --> 01:13:33,685 Or hopefully it will converge into some kind of equilibrium. 1223 01:13:33,685 --> 01:13:36,900 1224 01:13:36,900 --> 01:13:41,800 There is a basic fallacy here, which is that the information 1225 01:13:41,800 --> 01:13:46,960 that comes in here is used to compute part of this message. 1226 01:13:46,960 --> 01:13:49,990 Which is used to compute part of this message. 1227 01:13:49,990 --> 01:13:53,540 Which ultimately comes back and is used to compute part of 1228 01:13:53,540 --> 01:13:54,190 this message. 1229 01:13:54,190 --> 01:14:00,400 And then part of this message again, so that information 1230 01:14:00,400 --> 01:14:06,590 goes around in cycles and it tends to be used again and 1231 01:14:06,590 --> 01:14:07,450 again and again. 1232 01:14:07,450 --> 01:14:10,320 Of course, this tends to reinforce 1233 01:14:10,320 --> 01:14:12,190 previously computed messages. 1234 01:14:12,190 --> 01:14:13,790 It tends to make you overconfident. 1235 01:14:13,790 --> 01:14:16,330 1236 01:14:16,330 --> 01:14:23,810 You heard some rumor and then the rumor goes all around the 1237 01:14:23,810 --> 01:14:26,260 class and it comes back and you hear it again. 1238 01:14:26,260 --> 01:14:30,010 And that tends to confirm the rumor. 1239 01:14:30,010 --> 01:14:34,500 So it's that kind of telephone situation, and it's not really 1240 01:14:34,500 --> 01:14:37,190 independent information. 1241 01:14:37,190 --> 01:14:40,360 Intuitively, makes sense that if all of the 1242 01:14:40,360 --> 01:14:42,870 cycles are very large-- 1243 01:14:42,870 --> 01:14:46,230 1244 01:14:46,230 --> 01:14:51,970 in other words, the rumor has to go to China and back before 1245 01:14:51,970 --> 01:14:52,970 you get it. 1246 01:14:52,970 --> 01:14:56,610 That it probably is highly attenuated by 1247 01:14:56,610 --> 01:14:57,490 the time you get. 1248 01:14:57,490 --> 01:15:01,120 It's mixed up with all kinds of other information, so maybe 1249 01:15:01,120 --> 01:15:05,590 it doesn't hurt you too much if the cycles are very large. 1250 01:15:05,590 --> 01:15:12,100 And in fact, that's the way these capacity approaching 1251 01:15:12,100 --> 01:15:14,040 codes are designed. 1252 01:15:14,040 --> 01:15:17,630 Whereas, as I mentioned last time, if you're in a physical 1253 01:15:17,630 --> 01:15:23,010 situation where basically your graph looks something like 1254 01:15:23,010 --> 01:15:26,800 this, you don't have the freedom to make your cycles 1255 01:15:26,800 --> 01:15:30,520 large, then this is probably a bad idea. 1256 01:15:30,520 --> 01:15:33,660 And this isn't why in fields, like vision, for instance, 1257 01:15:33,660 --> 01:15:35,980 people try to use this-- 1258 01:15:35,980 --> 01:15:38,750 the sum-product algorithm is the belief propagation 1259 01:15:38,750 --> 01:15:40,810 algorithm, if you've ever heard of that. 1260 01:15:40,810 --> 01:15:44,070 Widely used in inference on Bayesian networks. 1261 01:15:44,070 --> 01:15:49,430 And the religion in belief propagation used to be that 1262 01:15:49,430 --> 01:15:52,260 you simply can't use belief propagation on graphs that 1263 01:15:52,260 --> 01:15:53,750 have cycles. 1264 01:15:53,750 --> 01:15:54,390 Why? 1265 01:15:54,390 --> 01:15:58,680 Because most of the graphs they dealt 1266 01:15:58,680 --> 01:15:59,910 with look like this. 1267 01:15:59,910 --> 01:16:01,880 And in fact, it works terribly on that. 1268 01:16:01,880 --> 01:16:04,850 1269 01:16:04,850 --> 01:16:08,090 It was a great shock on that field when the coding people 1270 01:16:08,090 --> 01:16:11,640 came along and said, well, we use belief propagation on 1271 01:16:11,640 --> 01:16:14,710 graphs with cycles and it works great. 1272 01:16:14,710 --> 01:16:19,580 And this, as I understand it, really had an enormous impact 1273 01:16:19,580 --> 01:16:23,020 on the belief propagation community. 1274 01:16:23,020 --> 01:16:26,150 But it was realized that gee, coding people have it nice. 1275 01:16:26,150 --> 01:16:27,400 You make your own graphs. 1276 01:16:27,400 --> 01:16:29,706 1277 01:16:29,706 --> 01:16:34,230 And you make them so this is going to be a very low order, 1278 01:16:34,230 --> 01:16:36,540 low impact event. 1279 01:16:36,540 --> 01:16:38,480 And that's right. 1280 01:16:38,480 --> 01:16:41,020 But coding we do have this freedom to 1281 01:16:41,020 --> 01:16:42,270 design our own graphs. 1282 01:16:42,270 --> 01:16:44,860 1283 01:16:44,860 --> 01:16:48,490 We realize from the basic arguments that I put up before 1284 01:16:48,490 --> 01:16:52,080 that we're really going to need to have cycles. 1285 01:16:52,080 --> 01:16:54,720 As long as we have cycle-free graphs, we're going to get 1286 01:16:54,720 --> 01:16:57,810 this kind of exponentially increasing complexity, which 1287 01:16:57,810 --> 01:17:00,250 is characteristic of trellis graphs. 1288 01:17:00,250 --> 01:17:04,480 So we're going to need cycles, but as long as the cycles are 1289 01:17:04,480 --> 01:17:07,610 very long and attenuated, as long as the girth of the graph 1290 01:17:07,610 --> 01:17:09,860 is great, maybe we can get away with it. 1291 01:17:09,860 --> 01:17:14,030 And that, in fact, is the way the story has developed. 1292 01:17:14,030 --> 01:17:18,960 And Gallager basically saw this back in 1961, but it took 1293 01:17:18,960 --> 01:17:22,400 a long time for it to catch. 1294 01:17:22,400 --> 01:17:26,690 So I didn't even get into chapter 13. 1295 01:17:26,690 --> 01:17:31,220 Chapter 13 we're going to first just introduce several 1296 01:17:31,220 --> 01:17:34,450 classes, the most popular classes of capacity 1297 01:17:34,450 --> 01:17:36,140 approaching codes. 1298 01:17:36,140 --> 01:17:42,370 And then I'm going to give you some real performance 1299 01:17:42,370 --> 01:17:46,420 analysis, how you would design, 1300 01:17:46,420 --> 01:17:50,060 simulate, in these codes. 1301 01:17:50,060 --> 01:17:52,690 And it's what people actually use. 1302 01:17:52,690 --> 01:17:54,440 OK, see you next time. 1303 01:17:54,440 --> 01:18:03,135