There are some 21,000 symbols in the macOS kernel, but all but around 3,500 are opaque even to kernel developers. The reasoning behind this was likely twofold: first, Apple is continually making changes and improvements in the kernel, and they probably don’t want kernel developers mucking around with unstable portions of the code. Secondly, kernel dev used to be the wild wild west, especially before you needed a special code signing cert to load a kext, and there were a lot of bad devs who wrote awful code making macOS completely unstable. Customers running such software probably blamed Apple for it, instead of the developer. Apple now has tighter control over who can write kernel code, but it doesn’t mean developers have gotten any better at it. Looking at some commercial products out there, there’s unsurprisingly still terrible code to do things in the kernel that should never be done.
So most of the kernel is opaque to kernel developers for good reason, and this has reduced the amount of rope they have to hang themselves with. For some doing really advanced work though (especially in security), the kernel can sometimes feel like a Fisher Price steering wheel because of this, and so many have found ways around privatized functions by resolving these symbols and using them anyway. After all, if you’re going to combat root kits, you have to act like a root kit in many ways, and if you’re going to combat ransomware, you have to dig your claws into many of the routines that ransomware would use – some of which are privatized.
Today, there are many awful implementations of both malware and anti-malware code out there that resolve these private kernel symbols. Many of them do idiotic things like open and read the kernel from a file, scan memory looking for magic headers, and other very non-portable techniques that risk destabilizing macOS even more. So I thought I’d take a look at one of the good examples that particularly stood out to me. Some years back, Nemo and Snare wrote some good in-memory symbol resolving code that walked the LC_SYMTAB without having to read the kernel from disk, scan memory, or do any other disgusting things, and did it in a portable way that worked on whatever new versions of macOS came out.
The __LINKEDIT segment and LC_SYMTAB weren’t loaded into kernel memory util around Snow Leopard, and so prior to that a number of root kits had no choice but to read the symbol table off disk by opening up /mach_kernel, which of course has also been moved around. Today’s versions of macOS make it much easier for a developer to skirt around the privatized kernel symbols, and this is a positive thing, because developers don’t have to be so dangerous with their resolving code.
Nemo and Snare’s code has gotten a bit old and stale, so I thought I’d freshen it up a bit under the hood. Two things in particular needed some work to get the engine to turn over. There were some pointer offsets in LC_SYMTAB that weren’t being used right which broke on any recent version of macOS, and it also didn’t handle kernel ASLR, which made it unusable. I fixed the symbol table pointers so that we’re reading the right parts of LC_SYMTAB now, and I’ve also come up with a novel way to deduce the kernel base address by using some maths and a command that Apple has exposed to the public KPI to unslide memory, which subtracts vm_kernel_slide out for you.
The functon vm_kernel_unslide_or_perm_external was originally added to expose an address to userspace from the kernel or heap. Exposing kernel address space to userspace seems like a really awful idea, but the function can be used for just that; if you feed it the usual kernel load address (0xffffff8000200000), it will subtract vm_kernel_slide for you, which isn’t exposed to the KPI, and give you the base kernel address in memory – really quite simple and elegant. No ugly hacks required. You don’t have to back-read memory to find 0xfeedfacr or anything else. Apple’s code is pretty intentional, so this isn’t a hack either; they’ve provided you with a way to unslide kernel ASLR from within the kernel, which is a lot safer than some of the ways devs were doing it before. Now macOS Sierra works with the kernel load address, but macOS El Capitan had slightly different code that required you pass in any address within the kernel’s post-slide range; code that works on both operating systems is to simply pass in the address to printf or any other public KPI function, then do the subtraction.
In addition to these fixes to the code, I’ve also added a simple usage example to demonstrate how to call a function once you’ve actually found the symbol. There are a few different conventions that are possible, I used a less old school and more implicit technique to invoke proc_task to obtain the task for launchd in this example.
Click the link below to read the full source of the new and improved version of Snare’s kernel resolver. Special thanks to Snare for making his original code available.