Hi, well today I will be able to talk to you about tracing for argument writers and binary
reviewers engineering in Linus.
So here is how that works.
So here is how that works.
The person produces two performance and behavior analysis tools, especially in Linus, and focus
on the distinction between the two.
I will then talk about my Linus Trace Toolkit, which is my tracers project.
We wrote a kernel tracer and a user space analysis tool for those tracers.
I will then talk about application of tracing for reverse engineering purposes.
I will then proceed to a small email, but few emails in fact.
So analysis really can have, well, some types of analysis that are available in software,
some are targeted toward performance analysis, while others are more like, well, you want
to understand the behavior of the program.
So on the side of performance analysis tool, we got profiling tools like gprof, or just
a tiny command, choc, which is very, very widely used, and my opr file, which is a kernel
profiler, well those are based on sampling of the information, most of it, so they do
not give detailed insights of what the program is doing, but mostly gives an idea of where
CCU time is stated.
Well, other kind of tool that can be used for performance tuning are event-based performance
monitoring tools, like LTT system tab, LTS, just to name a few, that are used on the Linux
platforms.
And a separate behavioral analysis, well, so what's interesting here is you have the
complete information about, well, the execution flow of the program.
You want to know how it acts.
So tracers are especially good for that, and well, this is the focus of my presentation.
So on Linux, you got LTT, LTST, S-Trace is maybe the most widely used, and we just want
you to know every interaction that one specific program has with the tracer, LTrace also,
to get all library calls from the program.
And also, well, the developers are very good at that, right?
So for tracers, let's define two simple concepts.
Edits, what's an edit?
So I just want to make sure that everybody here has an idea of what is a tracer.
So we got edits.
What is an edit?
It's an information record that has a time step.
So you just log information and log the time at which it happened.
And a tracer is just a meta file with many of those edits.
It doesn't need to be the same file.
It's just a well set of files with edits.
So now, on the side of my project, LTST Tracer toolkit is a kernel tracer from the start.
And then I expanded it to add user space tracing capabilities.
So I can trace libraries and programming.
And so it's all instrumented efficient-based.
That means that, well, what is very good is when you have the source code.
Because you can insert your own trace points, your own instrumentations.
What is instrumentation?
It's the act of modifying code to put a statement in it, to record information.
So when you're debugging a program that you own and that you have a source code,
you can just put a print tag in it.
But, well, you will see that it's much more efficient and fast to use a tracer instead of that.
What are the key features of a LTST Tracer toolkit?
Oh, by the way, LTT entry stands for LTST Tracer toolkit next generation.
How do we know about this?
And LTT is LTST Tracer toolkit viewer.
So the key features on the viewer side, LTTB, it provides a multi-graded modular analysis of what.
So we can analyze the information coming from all the systems.
So from multiple information sources.
So you can get, at the same time, information from the terminal, from the libraries, from the programs themselves.
And analyze them and view them in the same tool.
So, well, this is one of the particular points where this tool brings analysis to further what exists.
So it focuses on that while having a very little impact on the system performance and behavior.
So for the viewer, it also provides a cool GUI as well as a useful test output.
And a full-frame framework to develop plugins to do some more analysis on information.
So, well, the goal of this conference is reverse engineering.
So I thought about the question, how can we apply reverse engineering?
So, well, you can get information from programs, libraries, drivers.
Well, you can instrument the numbers like you want.
And so you can extract information basically from your curriculum system.
So let's start with program analysis.
Well, basically you can be confronted to two cases.
One is where you have all the source code of an application that you just want to understand.
And, well, this is most of the work I do.
I do this while adding the source code.
So this is white box analysis.
But you can also be interested in doing black box analysis.
And, well, so what does this tool provide you when you want to do black box analysis?
So if you only get a binary.
So by adding all the OS operating system interaction that the program does,
you can know just like with S-Trace, you can know everything that this program has asked to the OS
regarding system network operations, engine process communication, memory management,
everything that you want to look at.
It's not enough for you.
You can still use K-POS or system type project which focuses on inserting breakpoints in binaries
to be able to execute some entry code.
So there's no reason why you couldn't just put tracing statements at those breakpoints.
So, well, it's especially well suited for analysis of concentrated and multi-processed programs.
As we will see later in a short example,
we can show the flow of the execution of multiple applications.
For example, if you were Mozilla talking with the X server
and you want to know how that reports the control when the site of Tracer is perfect for that.
And, well, as a plus, it also uses debugger detectors.
So this was for LightLimo.
What I want to show you is, well, starting from...
Wait a second.
Well, that's why I don't do LightLimos.
Let's go for a non-Limo.
So I want to show you entity body, entire body of things.
Well, I had a little look at it.
I see that it's one of the strengths of the kernel Tracer towards all the other tools that it tests.
For example, if you've got S-Trace, you will see that there are techniques that a program can use
to find out that it's running on the S-Trace and to behave in a different manner.
So I just wanted to make a point that, well, it can be useful.
So then I will do a little study of the virus and go for, well, see,
add a little fun with all the information we can extract from the leader of programs.
So if we want to talk about entire body in Linux,
well, I just want to do a little explanation of how the body works generally in Linux.
So it uses a bit of a Btrace system call.
It's used by DDB, S-Trace, Ltrace, etc.
What it can't do is only...
Well, so you are DDB.
You want to attach yourself to a running process, which may be your child.
So it gives you the ability to modify its memory on this bot
and to intercept all signals that are sent to this process
and intercept all the system calls that are made by the process.
So it's a very convenient framework to intercept everything that is done by a program or through a program.
But let's see what can be done against that.
Because, well, okay, let's put ourselves in the place of a binary program,
which is binary only, but it doesn't really want to be debugged.
So one technique...
I don't think you can see now, but that's it.
So from this article from Sathya Cesari,
just adding the program, doing a Btrace on itself,
and asking how I want it to be traced,
well, if it fails, it's because it's already traced.
So, well, it's a way to elude a lot of debuggers' trace on Trace, etc.
But, well, we can see that from the kernel side.
But we might find some other ways to do that in a stilt manner.
For this program, the only important line here is this assembly line, which is identity.
So what I did, I just put a breakpoint in my program,
and just before calling the breakpoint, I connect the siftrap,
which is the signal generated by a breakpoint, to a handler.
So if I run an attach to any debugger,
I will just call my callback and then come back to my program.
So it takes a certain amount of time, which I calculate by getting the size of the counter of the CTD.
And I expect this time to be quite short.
So what I do is, if I'm attached to a debugger,
this breakpoint will cause a signal that will be intercepted by the debugger.
So it will take a huge amount of time before I come back.
So if I just test that, it's okay, I know that I'm being debugged.
So, I've been myself at a place that someone wants to run this program in the debugger.
So what I do, I just find the source code, well, no, the object,
and I find the search where I've got the identity tree.
So I can basically just replace every identity tree with our breakpoints,
I find, in the object, with an app, 0x90, on x86.
And so, yes, I thought, it's easy to do, isn't it, right?
But if I run replace with null, run it, well, in fact, I can't give you the system.
But, if I take, well, I just made another program, which it takes not replaces,
it finds the time it takes for a null, and then verifies if, well, as you see,
it's in another element.
So we, basically, it points us that there might be any more system
that traces the information without having such a big impact on how the system behaves,
so it cannot be detected.
Another program for detecting this trace-and-trace, which uses this time a kill on itself,
and this time it's calling an on-existing system call.
And so, well, it's only by expecting some duration of this operation
that it's longer when it's traced right.
Okay, in a real life, in a real world, what uses this kind of techniques?
Well, I did a small research on the internet and found that this particular virus,
well, uses, actually, I'm talking about in the techniques,
is called the virus-in-time-stress-in-d.
So it impacts Linux binaries.
What it does, just to show, it infects the drone directory, slash,
and it puts back all the networking topics.
So, well, why not? Let's have a little fun with it.
Let's see what we can do.
So I'm in this small setup using an emulator, and I've got my terminal tracerate.
So this is the first example of the view we can have on the application generator.
So what we see here is the virus.
So we see the.paradot.tmp, which is kind of the execution of small temporary file, that's where,
and it was the original process which has been launched with the virus.
And you see that it's done for three other trips, right?
And then, well, I will count the probe and be over.
The only point is that this project is not quite there,
because we would see a whole lot of lines drawn here.
Well, too bad.
What we see here is red lines, dark red.
So that's why we don't see them.
And it shows that most of the time those processes are waiting for,
there are disks, user, and a chiddo.
So I decided to poke around with this virus,
so I took a quick while as the virus road was running for the first time.
And what I got is, well, okay, let me just explain you a little bit of those views.
So this is the process coming, we see it is processing,
and we see the execution through time.
And the different colors, when it's blue, it's running on the kernel,
it's running on the kernel of the process.
When it's green, it's in the process itself, which is running its own code.
And you can see this other color, which is when a process is waiting for a CD-web,
well, it's all this information right here.
And this other view here is the one I think is the most interesting
when we talk about details.
It's the detailed interface.
Every image that has been recorded in the trace appears here,
and it can be filtered with a filter.
So here what I see is the virus-linux-RSTP process is coming in a Ptrace.
So yeah, it's the first anti-deleting technique that it uses.
What's interesting here, the binary is, well, it opens slash root,
and it wasn't, well, it had nothing to do with the virus-feeble.
So we can extract all the information about how the system calls that
for a run-down, without disturbing the system behavior enough
so that it would be detected.
So I just wanted to make some fun seeing how it modified binary files,
it opens bin-cat, and then does a whole lot of reads and writes on the files.
So we see here everything that it modifies.
Another example application of that are the password-smoothing on a machine.
Well, I recently implemented a dumping of the first 32 bytes
of every read and write system calls in the kernel.
Well, it was pretty easy.
So what I've noticed is, hmm, wait a minute, there might be a security concern with that.
Well, in fact, it's not just a surprise to anyone.
If we are the kernel, we own the machine, we see everything that happens there, right?
But well, adding that in a trace that we can take as easily as start tracing,
we do our case, start tracing, we view it, well, that's big gaming to me,
or quite facilitated.
So if we take, for example, the SU and SSH programs,
well, the passwords will clearly be in clear text in the system calls,
especially the read and write system calls of those process.
So I just decided to run a trace on the SU program and see,
well, if I could get my password.
So first of all, I located the SU process, which is here,
and as well, a red line there.
And from there, I selected the interesting information with the filter,
the sign that I want to enter the name read, enter the name write,
only for the process SU.
And I got there.
We got the SU here with only the events that we require.
And we see that SU writes password and gets the data at me please,
which is not my password, but this is what I type.
So, well, so if I was a second example,
well, this kind of big broader information gathered to one system
can be used to see everything that happens.
So it's not just targeted to one process or one thread,
like most detailed behavior analysis to OSOF.
It's, well, we get information of all the system that works.
So, well, following this type of presentation,
I decided to trace Skype in Linux. Why not?
So, well, what I think is fun,
it doesn't use any deep trace detection techniques.
In fact, I ran it in S-Trace and it gave the same kind of results.
Some weird ones, programs like 1000 Seconds.
It tries to figure out very well how it's,
well, what are the characteristics of the machine in the process.
It calls times, which gets the system time information,
the CPU time used in the terminal in the process.
And it calls that in a loop about 10 times with the same instruction pointer.
So it's really from the loop that it calls it.
So there might be some security features there.
And it's used massively.
But, well, so that's just a small example.
But, well, what I'm showing you is one,
to get some more information about how your system behaves.
It won't find everything.
I mean, well, the Skype presentation showed very clearly
that most of the things happen in user space.
And what I'm talking about here is a kernel tracer.
So, well, we can't get that type of information without going,
well, here, well, with no technical,
but maybe putting probes in the program.
But then, well, you see that it's not the solution to everything.
About kernel reverse engineering.
So we can, one of the areas that would be interesting to investigate
is to use a system called Fuzzle.
Where this tracer would be interesting there is
to gather information about standard system calls, working system calls.
And then from there generate a Fuzzle by modifying some valid system call input.
So, well, this little module could be quite easily done
with all the framework that already exists.
To reverse drivers.
Well, I've not worked especially at drivers reversing,
but I have some ideas about how such a tracer can be used in that area.
So, if you look for every unimplied symbol of a kernel module,
you see, well, each symbol that it will use from other modules
that might be also binary only, that you don't have a source code from them,
but it can also be from kernel functions that are exported.
So by implementing those kernel functions,
you can gather some information from the driver.
Also, by using K-prox on the driver, you can put breakpoints.
Add, for example, entry function entry.
And then trace that and start to get some sense of how the driver interacts with interrupts,
which code map has been called before generating an interrupt response from the device.
Those kind of things.
On the side of hardware and reverse engineering,
well, as every interrupt in the kernel goes through a path that is,
well, in Linux, that is available for instrumentation,
we have information about every interrupt entry matrix.
So we can know the duration of these interrupt enders,
we know the frequency at which an interrupt arrives,
and also we can get the devices periodically and dump that in a trace.
So going memory periodically or when an irate user is being,
so you can get some sense of how this device will be interacting with the driver.
So as a conclusion, I invite you to try this program,
because I think that you might understand better how your operating system
and libraries and the program work together from a starting point.
And well, also if you want to feel free to contribute to it,
it's open source, distributed and murdered with a TPL license.
So on the side of analysis tools, you can create your own text and graphical plugins.
And on the kernel side, user space applications and libraries,
there is a huge place for new instrumentation there.
So if you want to try that, you're welcome.
The framework is created for adding new contributions.
Do you have any questions?
Thank you.
Applause