-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash when hooking sceIoClose() in a kernel module #84
Comments
Is this a syscall? If so you need the syscall macros. |
Hi yifanlu. Thanks for replying! If you mean adding: int state;
ENTER_SYSCALL(state);
(...)
EXIT_SYSCALL(state); in the hooks then I'm afraid I've already tried that, after posting this issue, with no success. For the record, I have now updated the code in the github repository with the sample to replicate the issue, to invoke those macros. Another thing I tried is break down the typedef int (sceIoClose_t)(SceUID fd);
int hook_user_close(SceUID fd)
{
int state;
ENTER_SYSCALL(state);
printf("sceIoClose(0x%08X) (Before TAI_CONTINUE)\n", fd);
struct _tai_hook_user *cur = (struct _tai_hook_user *)close_ref;
printf("sceIoClose: cur = %p, cur->old = %p\n", cur, cur->old);
sceIoClose_t* psceIoClose = (sceIoClose_t*)cur->old;
int r = psceIoClose(fd);
printf("r = 0x%08X (After TAI_CONTINUE)\n", r);
EXIT_SYSCALL(state);
return r;
} But there again, even as |
I think that, at the very least, I have found a workaround for this issue. The inspiration for it comes from what dots-tb does in ioPlus. Considering that original Thus, if you replace the int r = ksceIoClose(ksceKernelKernelUidForUserUid(ksceKernelGetProcessId(), fd)); then fds can be closed, from within the hook, and without incurring a crash. Of course, you then lose the ability to chain |
This was tricky, as there seems to be a taiHEN bug when hooking sceIoClose() See yifanlu/taiHEN#84 Also make sure the hooks are guarded by ENTER_SYSCALL/EXIT_SYSCALL
That’s good to know! |
Here is the explanation:
|
Good diagnosis, I think this issue can be closed. |
Thanks for trying to come up with an explanation. However I do have to mention that, of course, I too thought about infinite loops due to logging in open/close, and the first thing I tried was disabling logging altogether. I actually mentioned this in the Things I tried part of the issue, as the very first item! But I still got the crash. Besides, it would make little sense for Again, the first thing I tried was disabling logging and it still failed. So I have to disagree with your diagnosis. The issue is not related to calling
That's because I'm afraid your explanation is not correct. There is a bug somewhere when it comes to hooking Now, please also bear in mind that the logging module is just an example to demonstrate the issue. I am not interested in logging at all. Instead, I am developing a kernel plugin that allows "mounting" Sony pkg files as if they were already extracted/installed on the file system. And to be able to do that, I MUST be able to override Finally, I also have to report that the workaround still produces a freezout when trying to launch an application from the shell, when added to That is, if you don't do any logging at all, but simply override So, the module does seem to work when launched on an app basis, as was done in the test app, but when used as a This is not expected. Therefore, I still believe that there is some kind of underlying bug in taiHEN, for which we don't have yet an explanation, that needs to be investigated, as it is a complete showstopper for the creation of a whole category of very useful plugins, such as virtual file systems. So can this issue be re-opened please? And please, for people who want to provide a potential explanation or run some analysis (which I very much appreciate), please try to use and modify the code I shared to validate you proposal, as it should make things a lot clearer as to what does or doesn't cause the issue. |
Idea: |
Can you please read the original issue? I must admit I'm starting to get a bit annoyed having to dispell all of the obvious stuff that I already tried, which I made sure to document when I opened the original issue. That's not to say I don't appreciate your willingness to help (I really do, coz you are a lot more familiar with the system than I am), but at this stage, if you think you have an idea about what in the code is causing the issue with Note that I don't care if the |
Okay I'll re-open the issue. |
Thanks. |
Sooo, since I'm stuck on this issue, and considering building my own hooker for Basically I've been comparing the
This tells us (using an ARM Thumb disassembler such as this one) that, before the hook, we have the original 0x0000: push {r3, r4, r5, lr}
0x0002: mov r4, r0
0x0004: blx #0x14eb4
0x0008: cmp r0, #0
0x000a: bgt #0x3a
0x000c: movs r0, #0
0x000e: mov r1, r4
(...) Then, after the hook is set: 0x0000: ldr.w pc, [pc, #0]
0x0004: .long 0x02710065
0x0008: cmp r0, #0
0x000a: bgt #0x3a
0x000c: movs r0, #0
0x000e: mov r1, r4
(...) Which tells us that the hook forces the call to jump to address 0x0000: push.w {r4, r5, r6, r7, r8, sb, lr}
0x0004: mov sb, r0
0x0006: sub sp, #0xc
0x0008: mrc p15, #0, r8, c13, c0, #3
0x000c: lsl.w r3, r8, #0x10
0x0010: mcr p15, #0, r3, c13, c0, #3
0x0014: movw r4, #0x8000
0x0018: movs r2, #0
0x001a: movt r4, #0x1de
0x001e: movs r1, #1
(...) I suspect the code above comes from the substitute "injector" used by taiHEN, and that at some stage, this "injector" executes a copy of the code that it replaced, namely: push {r3, r4, r5, lr}
mov r4, r0
blx #0x14eb4 before jumping back into the original code. So, of course, the presence of the Now, from a very quick look at substitute, I can see that it had some provisions for branch handling and relocation. But I can't help but have some doubts as to whether it is properly able to handle a relocated |
0x02710064 Should be your function. If you disassemble it to the end, you’ll see it jump to the reproduced code for the replaced instructions. This replaced code should be the same behavior as the code it overwrote but with branches translated to absolute addresses. |
Well, here's the disassembly I get when I simply issue a 0x0000: movs r0, #0
0x0002: bx lr So far so good. Obviously, since we're not calling Now, let's have a look at a call that simply does 0x0000: movw r3, #0xc000
0x0004: movt r3, #0x1ee
0x0008: ldr r2, [r3, #0x6c]
0x000a: ldr r3, [r2]
0x000c: cbz r3, #0x12
0x000e: ldr r3, [r3, #4]
0x0010: bx r3
0x0012: ldr r3, [r2, #8]
0x0014: bx r3
0x0016: nop
0x0018: push {r4, r5, r6, r7, lr}
0x001a: sub sp, #0x10c
0x001c: cbz r0, #0x74
0x001e: movw r3, #0xc088
0x0022: movt r3, #0x1ee
0x0026: ldr r6, [r3]
0x0028: cbz r6, #0x74
0x002a: ldrb r3, [r0]
0x002c: mov r5, r0
0x002e: cbnz r3, #0x4a
0x0030: movw r5, #0xb080
0x0034: mov r4, r1
0x0036: movt r5, #0x27e
0x003a: mov r6, r3
0x003c: ldm r5!, {r0, r1, r2, r3}
0x003e: ldr r5, [r5]
0x0040: stm r4!, {r0, r1, r2, r3}
0x0042: str r5, [r4]
0x0044: mov r0, r6
0x0046: add sp, #0x10c
0x0048: pop {r4, r5, r6, r7, pc}
0x004a: mov r4, r1
0x004c: blx #0x2fa0
0x0050: mov r1, r5
0x0052: mov r7, r0
0x0054: mov r2, r0
0x0056: add r0, sp, #8
0x0058: blx #0x2f60
0x005c: movs r5, #1
0x005e: movs r3, #0
0x0060: mov r2, r7
0x0062: add r0, sp, #8
0x0064: mov r1, r4
0x0066: strd r5, r3, [sp]
0x006a: blx r6
0x006c: mov r6, r0
0x006e: mov r0, r6
0x0070: add sp, #0x10c
0x0072: pop {r4, r5, r6, r7, pc}
0x0074: movs r6, #3
0x0076: movt r6, #0x8002
0x007a: b #0x44 The first thing we notice is that it trashes Now I understand that, as a general rule, code should expect I'm also a bit surprised at how happily the override seems to be willing to use At any rate, what is clear from this is that, |
I don’t think there is such code. ARM ABI prohibits it. |
Well, Sony have a long history of completely disregarding whatever existing conventions or specs other people use, and introducing their own... 😄 |
Their compiler is gcc based though. |
I'm pretty sure you can achieve non ARM ABI compliant behaviour with vanilla gcc through inline assembly. The fact is that nobody here knows why My understanding is that, if Sony's code was compiled with an ABI compliant toolchain, you should never observe a Also, I can confirm that when I apply my own hooker (Boy is that a complete pain in the ass to replicate! - I have even more respect for what you must have gone through, when developing taiHEN, now that I've seen just how much work is required to simply get hooks that work), I get a stable Thus, as far as I am concerned, there is something that should be improved or fixed in the manner taiHEN applies hooks, because I can demonstrate that, if you use an alternate method to create your hooks, you won't observe the crash that you observe with taiHEN... and I doubt everybody will want to go through the ordeal of creating their own custom hooker, just to work around what seems to be a problematic taiHEN behaviour. I should also point out that one thing I tried, without success, was to create a taiHEN hook, with I guess the one way to prove or disprove the |
Thanks for looking into this. It’s not non-compliant to save R3 even if it doesn’t have to be saved. However I agree it’s a suspicious piece of code. If you make TAI_CONTINUE take 4 args, make sure your patched function also takes 4 args and no other hooks are installed on it. Someone may want to try this out to confirm. |
The hook system basically redirects to my_hook by replacing the first 12 bytes (* 1) of the function. The opcode before being replaced with the hook opcode is copied to the RX memory by taiHEN
The problem at this time is to call the function with org sceIoClose opcode 12byte(3.60)
The mechanism to call the function is realized by jumping to the offset determined by the following calculation
But when hooked,
Therefore, when hooking a function such as sceIoClose, it is necessary to re-implement it by yourself. |
Issue & replication
As the title indicates, I'm encountering a crash whenever I try to hook
sceIoClose()
that I am beginning to think might be a taiHEN issue rather than something with my code.You can find a complete simple project to demonstrate the issue, with a simple kernel module and a test application that takes care of loading/unloading it (so that you don't have to crash the whole boot process by adding the module to
tai/config.txt
) at: https://github.com/VitaSmith/sceIoCloseThe instructions on how to recompile and run the module and test application to replicate the crash are in the Readme.
I have been testing with taiHen on fw 3.60 and I really can't figure out what it is I might be doing wrong in my code to produce the crash.
Code
Basically, the code for the hook is:
And the installation code:
However, the end result is that a crash occurs in TAI_CONTINUE as per the log output below:
Things I tried
ksceIoClose()
→ Still crashes!sceIoCloseForDriver()
insteadsceIoClose()
→ This doesn't crash but of course the override I need issceIoClose()
.This does seem to confirm that the problem appears to be only with
sceIoClose()
override however.SceIofilemgr
NID (0xF2FF276E
) instead ofTAI_ANY_LIBRARY
→ Same issue.sceIoClose()
in the test app, but waiting to unload the module → Still crashes as soon as a background app callssceIoClose()
.void *args
as extra parameters tosceIoClose()
andTAI_CONTINUE()
just in case → Same issue.SceUID
as return value instead of int → Same issue.After having spent the last 24 hours trying to figure this out, I wouldn't mind having some expert input on this issue, as I really can't figure out what may be wrong in the
sceIoClose()
override when thesceIoOpen()
one is working just fine. At this stage, I have reason to believe that the crash might have to do with taiHEN itself, which is why I am logging this issue.The text was updated successfully, but these errors were encountered: