WEBVTT 00:30.000 --> 00:55.720 Here to talk to you about semi-automatic unpacking, specifically referring to unpacking a malware 00:55.720 --> 01:02.360 or something to do with machine guns, but to talk about something here, I'll call you 01:02.360 --> 01:03.360 up. 01:03.360 --> 01:09.080 And to get started, man there's a lot of factors out there. 01:09.080 --> 01:17.760 The problem that poses for people like me, going malware, reverse engineering, and anti-virus 01:17.760 --> 01:24.760 companies, is that anybody can take a piece of malware and trivially repack it and evade 01:24.760 --> 01:25.760 detection. 01:25.760 --> 01:34.720 And a lot of times, in my role, I'll be faced with seeing hackers that I've never seen before 01:34.720 --> 01:35.720 yet. 01:35.720 --> 01:39.560 I've got to figure out how they work and get to the malware inside and figure out what 01:39.560 --> 01:40.560 it does. 01:40.560 --> 01:45.520 And I really don't want to spend a whole lot of time playing around with the hacker code, 01:45.520 --> 01:47.520 because that's not unique to the malware. 01:47.520 --> 01:51.040 It's something that's just kind of in the way for me. 01:51.040 --> 01:55.720 So I don't want to spend my time messing around with all these hackers, but I do want to be 01:55.720 --> 02:01.640 in the room every week and figuring out how they all work is getting the work. 02:01.640 --> 02:04.840 So I'd like to be able to do this in a more automatic way. 02:04.840 --> 02:11.280 So I wanted to review some of the methods that people are using right now when it comes 02:11.280 --> 02:16.200 to automated unpacking. 02:16.200 --> 02:22.720 The hard way is you can take this particular hacker and you can spend a lot of time looking 02:22.720 --> 02:23.720 at it. 02:23.720 --> 02:31.560 Reverse engineering the hacker code, figuring out how it writes the sections, what kind 02:31.560 --> 02:39.480 of encryption or compression that it uses, and how it sets up the import table after 02:39.480 --> 02:40.480 it's done. 02:40.480 --> 02:42.560 It takes a lot of time. 02:42.560 --> 02:47.320 But when you're finished with all that, you can then write your own engine to basically 02:47.320 --> 02:53.960 do everything that the unpacking code does, handling all these algorithms and compression 02:53.960 --> 03:04.520 methods, and then write a one-time use per version unpacker for Packer X. 03:04.520 --> 03:10.040 And a lot of people have done this, and I'm not knocking it, but it takes a lot of time 03:10.040 --> 03:13.680 and energy to maintain something like this. 03:13.680 --> 03:20.520 So it's great that people have done this, but there's just not enough of these unpackers 03:20.520 --> 03:26.280 for me to go and say, well, I've got MSG version 133, does MSG handle that? 03:26.280 --> 03:30.240 I don't have time to give up on that either. 03:30.240 --> 03:35.000 AV engines, they've got to have some sort of unpacking methodology. 03:35.000 --> 03:38.000 They have to know something about these Packers. 03:38.000 --> 03:42.840 And if they are to go and put code into their AV engine for every single Packer that's out 03:42.840 --> 03:47.560 there, this engine is going to be huge after a while, and possibly introducing even more 03:47.560 --> 03:50.560 plugs and overflows and things like that. 03:50.560 --> 03:55.840 So that's the hard way. 03:55.840 --> 04:00.640 Another approach that people take sometimes is, well, instead of learning about the algorithm, 04:00.640 --> 04:07.240 why don't I just emulate the CPU, and we'll just let this packing code think that it's 04:07.240 --> 04:09.920 running and writing to its memory. 04:09.920 --> 04:12.920 But we're just going to virtualize all this. 04:12.920 --> 04:16.320 And that is a pretty good approach. 04:16.320 --> 04:19.320 A lot of people take that approach too. 04:19.320 --> 04:22.920 Then you don't have to really deal with the algorithms that we use in the Packer. 04:22.920 --> 04:28.280 You just have to deal with any kind of any of the secrecy that caused your emulation 04:28.280 --> 04:32.320 engine not to work, and just go and fix those. 04:32.320 --> 04:36.520 You've got to put a lot of time and energy into this emulation engine. 04:36.520 --> 04:39.320 It takes a lot of time even to get one off the ground. 04:39.320 --> 04:45.200 So a lot of times I see people write such things but then not release the code. 04:45.200 --> 04:46.680 It's something that I don't have access to. 04:46.680 --> 04:53.400 Maybe the antivirus companies have this internally, or somebody like Peter Vania has a Packer 04:53.400 --> 04:56.400 engine, and people just aren't giving it away for free. 04:56.400 --> 04:59.400 It's a lot of work, and that's their right. 04:59.400 --> 05:07.480 Then there's the cheating method, and I've done this myself, you know, when I'm in memory. 05:07.480 --> 05:11.840 Just running the code, putting it into memory on a system that I don't care about or that's 05:11.840 --> 05:14.840 a VMware image that I can re-image later. 05:14.840 --> 05:20.960 And then, you know, once it's in memory, just go ahead and dump it right out of memory. 05:20.960 --> 05:25.920 And that's, for all my tests and purposes, unpacked. 05:25.920 --> 05:33.160 But it's not something we can really use if we want to do an in-depth analysis because 05:33.160 --> 05:36.600 we don't know what that code looked like at the time that it started. 05:36.600 --> 05:42.000 We only know, we don't have a snapshot of what it was like at that moment in time. 05:42.000 --> 05:46.120 So we don't know how the variables that were initialized at the start changed. 05:46.120 --> 05:54.300 You know, things are in there that wouldn't be in the actual unpacked code. 05:54.300 --> 05:57.760 You don't know where the entry point was necessarily. 05:57.760 --> 06:04.200 You can do, in some cases, you know, use some heuristics if you have various compilers and 06:04.200 --> 06:07.200 they all do the entry point kind of the same way. 06:07.200 --> 06:12.160 Sometimes you can find it that way, but other times if it was coded in assembly, you know, 06:12.160 --> 06:14.440 you may not have an entry point that's recognizable. 06:14.440 --> 06:20.240 So you're kind of left guessing where that entry point was and where all the code started. 06:20.240 --> 06:27.280 There are other ways around that, but it's just not a clean way to go about it. 06:27.280 --> 06:35.760 So when I look at all the code that I've unpacked, or from the various packers, notice that they 06:35.760 --> 06:42.680 all, well not all, but a great deal of them have a very similar methodology about how 06:42.680 --> 06:48.960 they pack and then unpack the executable that you're dealing with. 06:48.960 --> 06:55.640 They'll first analyze the code segments, say, okay, I've got four segments in this file. 06:55.640 --> 06:59.120 I'm going to compress those in various ways. 06:59.120 --> 07:01.480 I'm going to take the import table and chop it up. 07:01.480 --> 07:08.960 And then I'm going to put a subsection at the end of the file and that's going to be 07:08.960 --> 07:16.560 where the new entry point is so that the unpacking code can run first and then when it's done 07:16.560 --> 07:23.240 its job and everything's processed, the import table is set back up. 07:23.240 --> 07:30.440 It will jump to what we call the OEP, the original entry point, the point that, if you 07:30.440 --> 07:35.680 used it in the entry point before, it was packed. 07:35.680 --> 07:39.760 So this is kind of just a graphic example of how it works. 07:39.760 --> 07:44.960 In the unpacked code, you've got just, you know, very straightforward. 07:44.960 --> 07:49.240 Into the code section is the first section of the file and then you'll have other sections, 07:49.240 --> 07:51.240 data section, resource sections. 07:51.240 --> 07:56.800 And there's an entry point field in the PE header that points somewhere into that code 07:56.800 --> 08:02.000 section and that's where, as soon as the file is loaded, it'll jump to that execution point 08:02.000 --> 08:03.000 again. 08:03.000 --> 08:09.000 In the packed version, we see that the entry point now points down to a new section that's 08:09.000 --> 08:13.600 been appended to the file and then it's going to run. 08:13.600 --> 08:17.280 And somewhere in there, it's going to store the OEP. 08:17.280 --> 08:26.480 And a lot of times this is intentionally obfuscated so you can't just read the memory, you know, 08:26.480 --> 08:32.520 that the crypto stuff or read the file and say, okay, here's where the OEP is stored 08:32.520 --> 08:36.400 so I just need to set a break point there. 08:36.400 --> 08:37.400 Sometimes you can. 08:37.400 --> 08:42.520 I mean, some hackers aren't that sophisticated, but in most cases, hackers are making a conservative 08:42.520 --> 08:47.040 effort to make this information obscure because it would be pretty darn easy if you could 08:47.040 --> 08:53.160 just say, this is the address that I set a break point on, I'm done. 08:53.160 --> 09:00.840 So if you use a OLLI bug like I do and you unpack code, you've probably sat there and 09:00.840 --> 09:06.840 said, gee, you know, if I could just set a break point on that section as a whole and 09:06.840 --> 09:10.520 then stop running when it reached that section because I know that that's, you know, when 09:10.520 --> 09:15.600 the code is going to be unpacked, then that would be great. 09:15.600 --> 09:16.600 And you can do that already. 09:16.600 --> 09:23.240 You can toggle a break point on the memory section, but this is break on access, which 09:23.240 --> 09:29.200 means that anything that touches that code will cause a break. 09:29.200 --> 09:36.000 The problem with that is that before you actually jump in and execute this code, you're going 09:36.000 --> 09:40.600 to be reading, possibly, most definitely writing to it. 09:40.600 --> 09:45.800 So you're going to do this hundreds of times, thousands of times, depending on how big the 09:45.800 --> 09:47.520 code is you're unpacking. 09:47.520 --> 09:54.480 So you'd have to set this break point and reset it again and again and again and again, 09:54.480 --> 09:56.480 which turns out to be a really slow process. 09:56.480 --> 10:03.920 I mean, even if you automate it, you can use OLLI script or something like that and automate 10:03.920 --> 10:06.920 the section, say, well, am I running yet? 10:06.920 --> 10:09.920 No, I'm just reading, okay, set the break point again. 10:09.920 --> 10:12.920 You can do that, but that's really, really slow. 10:12.920 --> 10:19.920 And if you're going to do that, I mean, why not just set a trace point and say, by my 10:19.920 --> 10:27.920 execution, my instruction pointer is between this range and this range, then stop executing. 10:27.920 --> 10:34.920 That's a pretty old way of putting the code that goes way back in the day. 10:34.920 --> 10:43.920 It's pretty common and because it's not only, you know, red is this low, you can also detect, 10:43.920 --> 10:50.920 you know, in the hacker code itself, it can say, hey, check the CPU instruction count 10:50.920 --> 10:54.920 and see how much time it's taken to run the subroutine. 10:54.920 --> 10:58.920 It's taken way longer than it should have and there's a chance that some debugger is 10:58.920 --> 11:01.920 interacting with me and doing tracing. 11:01.920 --> 11:08.920 So it would be great, I thought to myself and, you know, hopefully somebody else out there 11:08.920 --> 11:12.920 has thought that to themselves that, you know, if I just have a way to break on execution, 11:12.920 --> 11:17.920 not access, just the execution of a memory section without having to do all this tracing, 11:17.920 --> 11:20.920 I might be able to do something. 11:20.920 --> 11:22.920 So that's what OLLI bone is. 11:22.920 --> 11:28.920 OLLI bone stands for break on execute for OLLI bone. 11:28.920 --> 11:39.920 So how we did this is to look at the architecture and say, well, is there a possibility 11:39.920 --> 11:42.920 that I could set a no execute bit? 11:42.920 --> 11:46.920 Well, with the X86, you can't. 11:46.920 --> 11:53.920 Now, it's been added to CPU since the back of the day, but as it stands in chains, 11:53.920 --> 11:56.920 I just generally don't have this bit. 11:56.920 --> 12:04.920 So they did solve this problem in the Pax project because if you remember, 12:04.920 --> 12:11.920 Pax is designed to stop execution if your code happens to be running in the heat 12:11.920 --> 12:15.920 or happens to be running in the stack, then that's a sign that there's been 12:15.920 --> 12:18.920 a bug or overflow and something's bad, bad is happening. 12:18.920 --> 12:23.920 So Pax already took care of this some years ago. 12:23.920 --> 12:29.920 So what we do here is we take the same idea and instead of talking about 12:29.920 --> 12:34.920 protecting the stack in the heat, we're just going to look at protecting 12:34.920 --> 12:39.920 just arbitrary pages that we just picked out to target, you know, 12:39.920 --> 12:45.920 this is where we want to stop execution in. 12:45.920 --> 12:50.920 So I hope everybody's familiar with Pax. 12:50.920 --> 12:52.920 I have a passing familiarity. 12:52.920 --> 12:56.920 I've never actually run it. 12:56.920 --> 13:04.920 There's a...anytime you try to do some sort of memory access, 13:04.920 --> 13:09.920 you have to translate, your OS has to translate your virtual address 13:09.920 --> 13:14.920 that you're asking for to the actual physical address of the memory 13:14.920 --> 13:17.920 so that we can maintain this gigantic virtual memory space 13:17.920 --> 13:21.920 that we only have, you know, a small actual physical memory space 13:21.920 --> 13:27.920 and the operating system figures that out, it can be that up and translates it for you. 13:27.920 --> 13:32.920 Because this process of doing these virtual memory translations is kind of 13:32.920 --> 13:39.920 time consuming, the Intel architecture uses what they call olympicized buffers. 13:39.920 --> 13:46.920 And so the first time you do one of these virtual to physical address translations, 13:46.920 --> 13:50.920 a olympicized buffer will cache that resulting address. 13:50.920 --> 13:53.920 So the next time you go and get that page in memory, you already know exactly 13:53.920 --> 13:57.920 where it is and it speeds up things tremendously. 13:57.920 --> 14:07.920 Well, then maybe Pax, probably as possible, is that the x86 uses separate TLDs. 14:07.920 --> 14:12.920 One is for data, they call it a DTLD, and one is for instructions. 14:12.920 --> 14:15.920 That's the ITLD. 14:15.920 --> 14:23.920 And because of that, what we can do is make one read one thing 14:23.920 --> 14:28.920 and then use another one in a separate method. 14:28.920 --> 14:35.920 Basically, we want to give a certain permission to the stack. 14:35.920 --> 14:38.920 We want the operating system to be able to read from that stack, 14:38.920 --> 14:44.920 so we go ahead and we let the DTLD cache that entry as is. 14:44.920 --> 14:49.920 But if we come around and it tries to do a read to execute the stack, 14:49.920 --> 14:55.920 and the ITLD doesn't get patched, it throws up a fault and kills the process. 14:55.920 --> 15:02.920 And it marks these pages in memory that it wants to protect 15:02.920 --> 15:08.920 by the user supervisor bit of the page table entry bit. 15:08.920 --> 15:13.920 So basically, it installs its own page ball handler, 15:13.920 --> 15:19.920 and any time a page ball is thrown because a particular page of memory 15:19.920 --> 15:22.920 only has supervisor permissions and we're running our user load, 15:22.920 --> 15:28.920 then the page ball handler figures out whether this is a protected area, 15:28.920 --> 15:36.920 or a stack, or if it's just another user supervisor set by some other processor in the crowd. 15:36.920 --> 15:43.920 So using that bit, we can then mark that page and say, 15:43.920 --> 15:50.920 it means more than just user supervisor, it means execute, no execute. 15:50.920 --> 15:57.920 So to make this Pax concept work so packed that we just have to do a few things differently. 15:57.920 --> 16:00.920 We're not protecting the stack in the deep. 16:00.920 --> 16:09.920 We are looking at a section of memory that's really just defined by the EEO loader. 16:09.920 --> 16:12.920 And we just find every page that belongs to that section, 16:12.920 --> 16:17.920 and then we mark those pages, we flip that piece of E-Bit. 16:17.920 --> 16:21.920 And then we install our own page ball handler, 16:21.920 --> 16:27.920 and this time instead of Pax where it's going to try to protect the operating system's integrity 16:27.920 --> 16:32.920 by killing that process, all we're going to do is from the page ball handler 16:32.920 --> 16:38.920 we're going to jump to the one handler, the single step break handler. 16:38.920 --> 16:46.920 And what this does is this passes our control from the kernel back to our program-based debug. 16:46.920 --> 16:49.920 So essentially all the debug comes back and says, 16:49.920 --> 16:58.920 I just got a single step break, and it stops right there. 16:58.920 --> 17:04.920 So the way this is implemented, is you've got an Olley Plug plugin 17:04.920 --> 17:10.920 that basically just handles the figuring out where the section of memory is, 17:10.920 --> 17:17.920 how many pages it has, and then sending IO controls to a kernel module, 17:17.920 --> 17:26.920 and basically it then just arbitrarily can assign these user supervisor bits to all these pages, 17:26.920 --> 17:32.920 and then also installs its own page ball handler to override the normal page ball handler 17:32.920 --> 17:37.920 and redirect the flow whenever we want to. 17:37.920 --> 17:43.920 You could also, if anybody's interested in using IDA for this, 17:43.920 --> 17:48.920 you could probably do this with the same kernel module, all you need is an IDA plugin, 17:48.920 --> 17:52.920 basically that can figure out what the memory segment that you want to target is, 17:52.920 --> 17:56.920 and just set IO controls for all of those pages. 17:56.920 --> 18:05.920 So it wouldn't be hard to implement. I just haven't done it because I typically don't use IDA as a debugger. 18:05.920 --> 18:13.920 I'm going to go a little bit about the process of how it actually flows in the page ball handler here. 18:13.920 --> 18:28.920 So your packer is now trying to unpack itself, and it's attempting to write to this page that we've set to be no execute. 18:28.920 --> 18:35.920 So at this point, this is the first time we try to access it, so there's no virtual address translation. 18:35.920 --> 18:45.920 So the page table, basically, the lock happens, and it comes back with a user supervisor bit set, so it generates a page ball. 18:45.920 --> 18:52.920 So that kernel directs into the page ball handler, where we've taken over the flow, 18:52.920 --> 19:02.920 and we check and see if that page ball belongs to us, if it's something as a result of what we've done here in Ollibong, 19:02.920 --> 19:06.920 or whether it's a real page ball that just needs to be passed on down. 19:06.920 --> 19:15.920 So if it does belong to us, what we do is we decide whether or not this is a data or an execution access. 19:15.920 --> 19:19.920 And it's pretty easy to do. All we have to do is look at the calling address, 19:19.920 --> 19:25.920 and then say, and push the pass to our page ball handler, and then see if that equals the instruction pointer, 19:25.920 --> 19:31.920 which we can also get from one of the front-end structures. 19:31.920 --> 19:35.920 So if it doesn't equal the instruction pointer, that means this was a data access. 19:35.920 --> 19:38.920 It wasn't trying to execute here, it was just trying to read. 19:38.920 --> 19:46.920 So what we do is we'll toggle that PTE bit, we'll set it back to user access. 19:46.920 --> 19:54.920 Then we'll do a read from the page, and when we do that, that caches this page table entry. 19:54.920 --> 20:02.920 So we don't have to do this again. That is caching to the DTLB, and then the next time it wants to read, 20:02.920 --> 20:07.920 it doesn't have to go to our page ball handler, so we don't slow the system down terribly. 20:07.920 --> 20:15.920 Then after we've gone ahead and cached that, then we just toggle that the page is bit back to supervisor mode, 20:15.920 --> 20:30.920 so that when our potential execution happens, that page ball will be hit again. 20:30.920 --> 20:37.920 So when that happens, hopefully, this is the target segment we've guessed is where the execution is going to occur. 20:37.920 --> 20:45.920 If it does happen, then what will happen is we have another page ball. 20:45.920 --> 20:50.920 The translation, the cache would try to do a page table walk-in and it came up and said, 20:50.920 --> 20:57.920 hey, you've only got supervisor bit set here and you're only a user, so it's going to generate that page ball for us. 20:57.920 --> 21:04.920 Once again, our page ball handler says, is this to us? Is this belonging to something else in the system? 21:04.920 --> 21:12.920 And then if it belongs to us, it says, does this constitute an execute access, 21:12.920 --> 21:16.920 which is now the faulting address equals the instruction for your address. 21:16.920 --> 21:22.920 So when that happens, what we have to do is pop one extra argument off the stack 21:22.920 --> 21:27.920 because the page ball handler is called with an argument that the end one handler is not, 21:27.920 --> 21:42.920 and then we simply jump to that error-append handler and let OlliBug take that control on the program. 21:42.920 --> 21:51.920 This works great on standard hardware. It works pretty seamlessly on VMware. 21:51.920 --> 21:58.920 I found that Box in QNU, I looked at their code, tried to make this work on it, 21:58.920 --> 22:03.920 and it just simply doesn't work because they haven't implemented split TLBs. 22:03.920 --> 22:10.920 Whenever somebody was writing this part of the code in those projects, they said, 22:10.920 --> 22:16.920 well, let's save some time. We'll just put all of the translation with the sidebuffers into one table. 22:16.920 --> 22:22.920 And so that doesn't give us the ability to do this, unfortunately, on our own platforms. 22:22.920 --> 22:29.920 I think I filed a bug a while back with QNU about that. I don't know if it's been fixed since then. 22:29.920 --> 22:35.920 You probably haven't seen it once since I did that. So it might work at some point. 22:35.920 --> 22:40.920 And I have not tested this on Microsoft Virtual PC. I'd be interested to know if any of you use that. 22:40.920 --> 22:54.920 I don't use it, so if you do try it out and it doesn't work for you or doesn't work, I'd like to hear from you. 22:54.920 --> 23:00.920 Usage, it's pretty straightforward. It's actually very fast. 23:00.920 --> 23:04.920 You're just going to load in your executable in OlliBug. 23:04.920 --> 23:12.920 You're going to view the memory map and look for where your process is loaded and then figure out 23:12.920 --> 23:21.920 what is going to be that final code segment when it's unpacked, which piece of those sections that are loaded in there, 23:21.920 --> 23:24.920 which one is going to be the one that's running in the unpacked state. 23:24.920 --> 23:29.920 And that's guesswork sometimes. Sometimes you'll know because you're familiar with a particular packer. 23:29.920 --> 23:35.920 You don't have to know all the algorithms, but you'll know at least that it's always going to jump to section one after it's done. 23:35.920 --> 23:41.920 It's pretty predictable. Sometimes you'll get it wrong and it'll never break. It'll run to completion. 23:41.920 --> 23:50.920 So if you're analyzing malware, that could be bad if you're not running this on a protected system that's isolated. 23:50.920 --> 23:54.920 If you're not using this for malware, you probably won't have a problem. 23:54.920 --> 24:02.920 But basically there's a break on execute flag that's been added to the menu, the right-click menu in the virtual memory space. 24:02.920 --> 24:08.920 So we just toggle that and we run the program. 24:08.920 --> 24:15.920 And then hopefully when it tries to execute that section after it's unpacked, it'll encounter the single step break. 24:15.920 --> 24:21.920 So control goes back to OlliBug and then it'll dump you out basically at the OEP, 24:21.920 --> 24:25.920 exactly where you want it to be in one step. 24:25.920 --> 24:29.920 So theoretically that's how it should work in most cases. 24:29.920 --> 24:36.920 The differences between hackers mean that it doesn't always work quite as cleanly as that. 24:36.920 --> 24:51.920 And I'm going to show you actually a video demo of about five different hackers and using OlliBug on them. 25:06.920 --> 25:35.920 So let's go ahead and start. 25:35.920 --> 25:42.920 So the first hacker that we're going to attack here is FSG, which is a pretty easy one to manually unpack. 25:42.920 --> 25:46.920 If you've ever walked through the code, there's not much to it. 25:46.920 --> 25:55.920 And if you look at the two or three different versions that are out there, you can spot pretty quickly the point at which it's going to jump to the OEP. 25:55.920 --> 26:04.920 But it's also a pretty common one that malware authors like to use, so I'll show you how long it takes to do this. 26:04.920 --> 26:12.920 So we loaded it up and we are landing somewhere in memory and we look. 26:12.920 --> 26:19.920 This is the memory map and we can see that the code section starts at 401,000. 26:19.920 --> 26:26.920 And we have actually started at 401,000, so we're actually in that subsection right now. 26:26.920 --> 26:34.920 So what we're going to do is we're just going to toggle our break on execute flag for that section. 26:34.920 --> 26:37.920 And then we're just going to hit the play button. 26:37.920 --> 26:44.920 And down at the bottom there you can see it's already hit our break on execute. 26:44.920 --> 26:53.920 So we're just going to break the code analysis here and we can now read the strings that are in the files up. 26:53.920 --> 27:01.920 I'd like to thank Peter Bagnia for providing these packed executables that he uses to test his U-Pack engine. 27:01.920 --> 27:06.920 He's saved a lot of time and happened to go and find a bunch of packers and pack something up with. 27:06.920 --> 27:13.920 So that's what we should see every time it's unpacked basically. 27:13.920 --> 27:19.920 So this one's done, we can dump this right now and we need the OEP and we're good to go. 27:19.920 --> 27:24.920 All right, moving on to an older version of U-Pack. 27:24.920 --> 27:28.920 Once again we see we're looking at the memory map. 27:28.920 --> 27:33.920 We are well out of that initial code section there. 27:33.920 --> 27:37.920 We are in the subsection. 27:37.920 --> 27:45.920 So we'll set our break on execute, hit play, and it's already done. 27:45.920 --> 27:50.920 We can analyze code and we can read it fairly easily. 27:50.920 --> 27:54.920 So that's two fairly straightforward, easy ones. 27:54.920 --> 28:00.920 I would have put U-PX in here as an example because that's also a very common one. 28:00.920 --> 28:07.920 For some reason U-PX did not like Peter's sample executable with the compressor. 28:07.920 --> 28:18.920 All right, so this is ASP.Protect. Is it ASP.Protect or ASP.Protect? 28:18.920 --> 28:21.920 ASP.Protect sounds kind of funny. 28:21.920 --> 28:26.920 It sounds a heck of a lot better than ASP.Pack. 28:26.920 --> 28:29.920 That's all I know. 28:29.920 --> 28:33.920 But we'll look at this one now. This one can take a few more tricks. 28:33.920 --> 28:40.920 We can see that we started this off and we are, because we've already unpacked this a couple of times, 28:40.920 --> 28:43.920 we know that that is actually our OEP. 28:43.920 --> 28:46.920 So it starts off in that actual code section. 28:46.920 --> 28:53.920 So we're not going to be able to simply set a break on execute right now on that section 28:53.920 --> 28:59.920 because we're already there and we wouldn't be able to execute anything. 28:59.920 --> 29:01.920 So we're going to have to work around that. 29:01.920 --> 29:07.920 Fortunately, we can see here that this push and this call and return and return, 29:07.920 --> 29:15.920 all this is is just a jump basically into another section there in 405.1. 29:15.920 --> 29:19.920 So we'll just step through that. 29:19.920 --> 29:24.920 And now we've landed in another section. 29:24.920 --> 29:26.920 So we'll go back to our memory now. 29:26.920 --> 29:30.920 We can set our break on execute on the code section. 29:30.920 --> 29:32.920 Hit play. 29:32.920 --> 29:34.920 And blah, blah, blah. 29:34.920 --> 29:38.920 This is debugger detection built into Asper Tech. 29:38.920 --> 29:43.920 It detected that we were running this under OLLI bug and didn't like it. 29:43.920 --> 29:48.920 So we're going to have to re-strategize here because just being able to unpack this stuff 29:48.920 --> 29:53.920 doesn't necessarily mean that it doesn't detect our debugger. 29:53.920 --> 29:59.920 So fortunately, this debugger detection technique that is being used here is not all that sophisticated. 29:59.920 --> 30:04.920 So basically, we could use the isDebugPresent plugin for OLLI debug 30:04.920 --> 30:09.920 and it just hides that isDebugPresent flag in the pattern. 30:09.920 --> 30:15.920 So we don't have to show up as being debugged before you can re-operate it. 30:15.920 --> 30:17.920 So we'll go through the motions again. 30:17.920 --> 30:19.920 We'll step through that first section. 30:19.920 --> 30:20.920 Land in the second section. 30:20.920 --> 30:23.920 Go to our memory map. 30:23.920 --> 30:25.920 Set our break on execute. 30:25.920 --> 30:27.920 And hit play. 30:27.920 --> 30:29.920 And we land somewhere. 30:29.920 --> 30:33.920 We landed 4104. 30:33.920 --> 30:38.920 A little strange because that isn't where we know that the OEP is. 30:38.920 --> 30:40.920 This code doesn't look like it's been unpacked. 30:40.920 --> 30:42.920 So what is it? 30:42.920 --> 30:48.920 Well, we go and remove the kind of messed up analysis that OLLI debug tried to do. 30:48.920 --> 30:51.920 We'll be able to read what that instruction is and its return. 30:51.920 --> 31:03.920 So what Aspect is doing is jumping basically into that code section and then returning probably to pop out that address from the stack and jump. 31:03.920 --> 31:05.920 And then do something else with it. 31:05.920 --> 31:14.920 So what we have to do here in order to continue with our execution is we're going to have to remove our break on execute that we set. 31:14.920 --> 31:16.920 So now we can step and we return. 31:16.920 --> 31:18.920 And now we're in the deep. 31:18.920 --> 31:21.920 So we're well out of our code section. 31:21.920 --> 31:25.920 So all we have to do now is go back to the memory map. 31:25.920 --> 31:26.920 Hit that section. 31:26.920 --> 31:29.920 Set a break on execute one more time. 31:29.920 --> 31:32.920 And run the program. 31:32.920 --> 31:34.920 And it ends up breaking. 31:34.920 --> 31:36.920 Now, we did hit a break on execute. 31:36.920 --> 31:38.920 We are at the OEP. 31:38.920 --> 31:43.920 But the code doesn't look quite right. 31:43.920 --> 31:53.920 It's unpacked here, it's just that in certain cases OLLI debug gets freaked out about the analysis and can't quite get it together. 31:53.920 --> 31:55.920 So we are actually at the OEP. 31:55.920 --> 31:57.920 We are unpacked here. 31:57.920 --> 32:12.920 But running OLLI debug's analysis just looks a little bit wonky here. 32:12.920 --> 32:14.920 All right, so that one's unpacked. 32:14.920 --> 32:19.920 Let's go on to the EEP pack. 32:19.920 --> 32:24.920 And it does kind of the same trick at the beginning that ASP.TEC uses. 32:24.920 --> 32:28.920 And it's not jumping outside of that code section. 32:28.920 --> 32:32.920 It's staying right in that first OEP. 32:32.920 --> 32:37.920 So we're going to have to set through it and see what the deal is. 32:37.920 --> 32:43.920 And we can see that only a few lines down we did an exception. 32:43.920 --> 32:48.920 So what it's doing is it's using the exception handler to do some of the work in packing the code. 32:48.920 --> 32:57.920 Try to play around maybe with emulators that don't properly handle exception. 32:57.920 --> 33:04.920 But now that we're in the exception routine, we're outside of that code section. 33:04.920 --> 33:11.920 And we can set our break on execute now on that section. 33:11.920 --> 33:14.920 When we do that, it might land here. 33:14.920 --> 33:21.920 And that's a jump to another section, which is reduced because that means that we're probably getting closer. 33:21.920 --> 33:25.920 So we're going to have to remove our break on execute in order to step. 33:25.920 --> 33:35.920 Because right now we're just kind of frozen. It won't let us do anything. So we can toggle that. 33:35.920 --> 33:41.920 And we now just hit F7 to step into that section there. 33:41.920 --> 33:44.920 So we just go back to the memory map. 33:44.920 --> 33:48.920 Set our break on execute one more time. 33:48.920 --> 33:49.920 And run it. 33:49.920 --> 33:54.920 And now we land, once again, at the break on execute, that OEP. 33:54.920 --> 33:56.920 And so we're unpacked. 33:56.920 --> 34:02.920 And we're still kind of in a condition where all you want is it doesn't help with the analysis. 34:02.920 --> 34:10.920 But you can see down towards the lower right of the screen, you can see the call to make the message box and exit process. 34:10.920 --> 34:18.920 So it's unpacked. 34:18.920 --> 34:19.920 All right. 34:19.920 --> 34:24.920 So that's one for example purposes here, TLOC. 34:24.920 --> 34:29.920 It's also kind of one that uses a few tricks. 34:29.920 --> 34:31.920 Starts outside the code section. 34:31.920 --> 34:36.920 So it seems like we would be good to go if we just go ahead and set a break on execute. 34:36.920 --> 34:42.920 So let's go to our memory map, pick out that target code section and set it. 34:42.920 --> 34:44.920 Program. 34:44.920 --> 34:46.920 We've landed somewhere. 34:46.920 --> 34:48.920 What's going on here? 34:48.920 --> 34:50.920 Let's go down to the side. 34:50.920 --> 34:52.920 Go down and look at, we've got a single step event. 34:52.920 --> 35:01.920 What's happened here is that TLOC is using single step exceptions as part of its unpacking optimization. 35:01.920 --> 35:05.920 It uses actually a different series of exceptions. 35:05.920 --> 35:07.920 And some of those have to be single steps. 35:07.920 --> 35:13.920 So what we're going to have to do here is we're going to have to basically handle the exception. 35:13.920 --> 35:27.920 So let's program handle the exception, shift F9 or control F9 and just keep running it until we no longer are looking at single step events, but we hit the break on execute. 35:27.920 --> 35:30.920 So that's run through a field here. 35:30.920 --> 35:33.920 So we just keep on hitting F9. 35:33.920 --> 35:35.920 We keep running and handling the exception. 35:35.920 --> 35:38.920 And now we've landed somewhere unfamiliar. 35:38.920 --> 35:42.920 You can see that there has been a break on execute breached. 35:42.920 --> 35:46.920 And if we do our analysis, we can see that we're back again. 35:46.920 --> 35:55.920 Back again. 35:55.920 --> 36:13.920 I'm going to rush to download it. 36:13.920 --> 36:19.920 We're in here. 36:19.920 --> 36:25.920 Is that a movie or is that live? 36:25.920 --> 36:29.920 It's a video of me going at live. 36:29.920 --> 36:33.920 The speed of the unpacking was not edited. 36:33.920 --> 36:42.920 I just basically put a bunch of clips together. 36:42.920 --> 36:52.920 It is a video for most of the testing purposes. 36:52.920 --> 36:57.920 If you want to do that, by the way, Xvidcap is similar. 36:57.920 --> 37:09.920 So the problems we're going to face using this method to unpack malware, we still haven't solved the fact that our devlogger can be detected. 37:09.920 --> 37:18.920 So any code that we're skipping over when we're saying I'm going to break on execute, it has the potential to do anything that it wants. 37:18.920 --> 37:24.920 It can text the debugger, it can just exit or potentially do other bad things. 37:24.920 --> 37:30.920 So you can't really use it as a 100% automated solution. 37:30.920 --> 37:37.920 That's why I call it semi-automatic unpacking because you have to interact with it to some extent. 37:37.920 --> 37:45.920 So the anti-debugging stuff you're still going to need to incorporate that into your debugger system. 37:45.920 --> 37:52.920 There's no rules saying you have to use a quote unquote debugger to use that kernel module. 37:52.920 --> 38:01.920 You can write your own unpacking engine and just use that kernel module as the basis for it and maybe get around some of the anti-debugging tricks. 38:01.920 --> 38:05.920 Obviously the isDebug present flag is a pretty trivial one. 38:05.920 --> 38:15.920 Some of the anti-debugging tricks I've seen, easy, they'll just look for a window type of auto debug, that's a pretty trivial one. 38:15.920 --> 38:21.920 Other ones might be a little harder where they do timing of exception handlers. 38:21.920 --> 38:25.920 So they have to work around those. 38:25.920 --> 38:37.920 Another thing that's kind of frustrating is some of the malware now, or some of the tappers, will refuse to run on VMware. 38:37.920 --> 38:45.920 I've had a particular thing that I was struggling with, trying to figure out why I was never able to unpack it. 38:45.920 --> 38:55.920 I thought that something I had done in Ollivov was wrong, but it turns out that it was actually rejecting the fact that it was running on RubyMln and when I ran it on a native code it worked fine. 38:55.920 --> 39:14.920 I haven't seen a lot of them. I understand the concept and I'm pretty sure there are some hackers which will be a little less rigid in the way that it unpacks and just puts everything back in nice little sections for you. 39:14.920 --> 39:24.920 Some of them will unpack, for instance, the heap and then run from there, just relocate all the jumps and everything. 39:24.920 --> 39:33.920 That's a problem right now because we haven't actually included a method to separate on a heap. 39:33.920 --> 39:53.920 It would be pretty trivial to do. I just haven't done it in two days because figuring out what the heap is and figuring out what potentially shared Chrome logic code might be is a little bit more time consuming for me. 39:53.920 --> 40:11.920 Needless to say, it would be bad if instead of a break on execute, it set its permissions onto a piece of shared memory and every process on the system suddenly came up with an in-break and an in-break at the same time. 40:11.920 --> 40:22.920 In terms of evasion, I'll get into a little bit how you can do it. It's totally possible not to say this is an in-doll method of unpacking. 40:22.920 --> 40:34.920 For instance, what if they don't make a stomp section? What if they do everything from that initial code section that it's eventually going to jump to? 40:34.920 --> 40:40.920 That could happen. We have seen where they start out in the code section, they jump elsewhere. 40:40.920 --> 40:50.920 We can always get a little bit finer grained. We can take this protection down page by page. They may be running in the last five pages of the code section. 40:50.920 --> 41:03.920 We can protect the first how many ever pages up to that point. Once again, slow the work and implementation details, but it's possible. 41:03.920 --> 41:21.920 The fact is that OliBone's kernel module is loaded, that can be detected, they could refuse to run, or they might even try to get clever and use that IOT control method to unset the break on execute after you've set it. 41:21.920 --> 41:35.920 The advantage you have here is that I'm giving you the source code on this, so you could change that to any dimension. You don't have to name that OliBone without sys. You can name it anything. You can change the IOT control numbers. 41:35.920 --> 41:46.920 If that ever comes around, that's a possibility. I'd be surprised if somebody actually started doing that. 41:46.920 --> 42:03.920 One thing that I think might be possible, I haven't really tested it out, if somebody plays around with virtual protect and having it read and write the PTEs, I don't necessarily know that the supervisor bit stays intact. 42:03.920 --> 42:23.920 There's a chance that after this memory is touched by some other API call, that we might want to continually maintain that bit and make sure that it's set and not just hit it once to toggle it on and hope that it stays that way. 42:23.920 --> 42:39.920 You can download it right now, Tmugpl. There's still stuff to do on it. I don't know necessarily that I'll implement these changes myself, but since the code is out there, don't complain about it. You can implement it yourself. 42:39.920 --> 43:00.920 You can share that back with the community. There's some ideas there in terms of the break-down execute for shared DLLs. There's some point you might want to be able to set a break-down on the kernel DLL. 43:00.920 --> 43:17.920 Theorizing that it's possible by using the copy-on-write aspects of that page and basically write to that, forcing it to be copied into another section of memory that's only going to be seen by your process and then set that break-down. 43:17.920 --> 43:29.920 If I haven't tested that, it's a very short part, but maybe that won't work. That's all I have for you right at this moment. Any questions about standing testing? 43:29.920 --> 43:35.920 Your driver is actually modifying the page fault handler? 43:35.920 --> 44:02.920 Yes. The driver is modifying the page fault handler. It's basically finding the location from the integral district table and it's inserting its own. Some of the code in the kernel model is actually adapted from other people's projects. The hooking of that integral handler, I actually borrowed some of that code from the Shadow Walker project, if you're familiar with that. 44:02.920 --> 44:20.920 It kind of did a similar thing where it took the tax concept and used that to make it possible to execute but not possible to read. That's something I've done there. 44:20.920 --> 44:37.920 I'm not really a kernel programmer, so I did put this kernel model together and debug it and made it work, but I don't consider myself a kernel programmer, or really even a C programmer for that matter. If you find bugs in there, that's probably logical. 44:37.920 --> 44:41.920 Do you think it will work in Vista? 44:41.920 --> 44:43.920 I'm sorry, it's obvious. 44:43.920 --> 44:45.920 Do you think this will work in Vista? 44:45.920 --> 44:55.920 In Vista? I don't know. I'll be able to find out from the beta tester. 44:55.920 --> 45:23.920 Thank you.