OATmeal on the Universal Cereal Bus: Exploiting Android phones over USB

Google Project Zero - Mon, 09/10/2018 - 12:18
Posted by Jann Horn, Google Project Zero
Recently, there has been some attention around the topic of physical attacks on smartphones, where an attacker with the ability to connect USB devices to a locked phone attempts to gain access to the data stored on the device. This blogpost describes how such an attack could have been performed against Android devices (tested with a Pixel 2).
After an Android phone has been unlocked once on boot (on newer devices, using the "Unlock for all features and data" screen; on older devices, using the "To start Android, enter your password" screen), it retains the encryption keys used to decrypt files in kernel memory even when the screen is locked, and the encrypted filesystem areas or partition(s) stay accessible. Therefore, an attacker who gains the ability to execute code on a locked device in a sufficiently privileged context can not only backdoor the device, but can also directly access user data.(Caveat: We have not looked into what happens to work profile data when a user who has a work profile toggles off the work profile.)
The bug reports referenced in this blogpost, and the corresponding proof-of-concept code, are available at: ("directory traversal over USB via injection in blkid output") ("privesc zygote->init; chain from USB")
These issues were fixed as CVE-2018-9445 (fixed at patch level 2018-08-01) and CVE-2018-9488 (fixed at patch level 2018-09-01).The attack surfaceMany Android phones support USB host mode (often using OTG adapters). This allows phones to connect to many types of USB devices (this list isn't necessarily complete):
  • USB sticks: When a USB stick is inserted into an Android phone, the user can copy files between the system and the USB stick. Even if the device is locked, Android versions before P will still attempt to mount the USB stick. (Android 9, which was released after these issues were reported, has logic in vold that blocks mounting USB sticks while the device is locked.)
  • USB keyboards and mice: Android supports using external input devices instead of using the touchscreen. This also works on the lockscreen (e.g. for entering the PIN).
  • USB ethernet adapters: When a USB ethernet adapter is connected to an Android phone, the phone will attempt to connect to a wired network, using DHCP to obtain an IP address. This also works if the phone is locked.

This blogpost focuses on USB sticks. Mounting an untrusted USB stick offers nontrivial attack surface in highly privileged system components: The kernel has to talk to the USB mass storage device using a protocol that includes a subset of SCSI, parse its partition table, and interpret partition contents using the kernel's filesystem implementation; userspace code has to identify the filesystem type and instruct the kernel to mount the device to some location. On Android, the userspace implementation for this is mostly in vold (one of the processes that are considered to have kernel-equivalent privileges), which uses separate processes in restrictive SELinux domains to e.g. determine the filesystem types of partitions on USB sticks.
The bug (part 1): Determining partition attributesWhen a USB stick has been inserted and vold has determined the list of partitions on the device, it attempts to identify three attributes of each partition: Label (a user-readable string describing the partition), UUID (a unique identifier that can be used to determine whether the USB stick is one that has been inserted into the device before), and filesystem type. In the modern GPT partitioning scheme, these attributes can mostly be stored in the partition table itself; however, USB sticks tend to use the MBR partition scheme instead, which can not store UUIDs and labels. For normal USB sticks, Android supports both the MBR partition scheme and the GPT partition scheme.
To provide the ability to label partitions and assign UUIDs to them even when the MBR partition scheme is used, filesystems implement a hack: The filesystem header contains fields for these attributes, allowing an implementation that has already determined the filesystem type and knows the filesystem header layout of the specific filesystem to extract this information in a filesystem-specific manner. When vold wants to determine label, UUID and filesystem type, it invokes /system/bin/blkid in the blkid_untrusted SELinux domain, which does exactly this: First, it attempts to identify the filesystem type using magic numbers and (failing that) some heuristics, and then, it extracts the label and UUID. It prints the results to stdout in the following format:
/dev/block/sda1: LABEL="<label>" UUID="<uuid>" TYPE="<type>"
However, the version of blkid used by Android did not escape the label string, and the code responsible for parsing blkid's output only scanned for the first occurrences of UUID=" and TYPE=". Therefore, by creating a partition with a crafted label, it was possible to gain control over the UUID and type strings returned to vold, which would otherwise always be a valid UUID string and one of a fixed set of type strings.The bug (part 2): Mounting the filesystemWhen vold has determined that a newly inserted USB stick with an MBR partition table contains a partition of type vfat that the kernel's vfat filesystem implementation should be able to mount, PublicVolume::doMount() constructs a mount path based on the filesystem UUID, then attempts to ensure that the mountpoint directory exists and has appropriate ownership and mode, and then attempts to mount over that directory:
   if (mFsType != "vfat") {        LOG(ERROR) << getId() << " unsupported filesystem " << mFsType;        return -EIO;    }    if (vfat::Check(mDevPath)) {        LOG(ERROR) << getId() << " failed filesystem check";        return -EIO;    }    // Use UUID as stable name, if available    std::string stableName = getId();    if (!mFsUuid.empty()) {        stableName = mFsUuid;    }    mRawPath = StringPrintf("/mnt/media_rw/%s", stableName.c_str());    [...]    if (fs_prepare_dir(mRawPath.c_str(), 0700, AID_ROOT, AID_ROOT)) {        PLOG(ERROR) << getId() << " failed to create mount points";        return -errno;    }    if (vfat::Mount(mDevPath, mRawPath, false, false, false,            AID_MEDIA_RW, AID_MEDIA_RW, 0007, true)) {        PLOG(ERROR) << getId() << " failed to mount " << mDevPath;        return -EIO;    }
The mount path is determined using a format string, without any sanity checks on the UUID string that was provided by blkid. Therefore, an attacker with control over the UUID string can perform a directory traversal attack and cause the FAT filesystem to be mounted outside of /mnt/media_rw.
This means that if an attacker inserts a USB stick with a FAT filesystem whose label string is 'UUID="../##' into a locked phone, the phone will mount that USB stick to /mnt/##.
However, this straightforward implementation of the attack has several severe limitations; some of them can be overcome, others worked around:
  • Label string length: A FAT filesystem label is limited to 11 bytes. An attacker attempting to perform a straightforward attack needs to use the six bytes 'UUID="' to start the injection, which leaves only five characters for the directory traversal - insufficient to reach any interesting point in the mount hierarchy. The next section describes how to work around that.
  • SELinux restrictions on mountpoints: Even though vold is considered to be kernel-equivalent, a SELinux policy applies some restrictions on what vold can do. Specifically, the mounton permission is restricted to a set of permitted labels.
  • Writability requirement: fs_prepare_dir() fails if the target directory is not mode 0700 and chmod() fails.
  • Restrictions on access to vfat filesystems: When a vfat filesystem is mounted, all of its files are labeled as u:object_r:vfat:s0. Even if the filesystem is mounted in a place from which important code or data is loaded, many SELinux contexts won't be permitted to actually interact with the filesystem - for example, the zygote and system_server aren't allowed to do so. On top of that, processes that don't have sufficient privileges to bypass DAC checks also need to be in the media_rw group. The section "Dealing with SELinux: Triggering the bug twice" describes how these restrictions can be avoided in the context of this specific bug.
Exploitation: Chameleonic USB mass storageAs described in the previous section, a FAT filesystem label is limited to 11 bytes. blkid supports a range of other filesystem types that have significantly longer label strings, but if you used such a filesystem type, you'd then have to make it past the fsck check for vfat filesystems and the filesystem header checks performed by the kernel when mounting a vfat filesystem. The vfat kernel filesystem doesn't require a fixed magic value right at the start of the partition, so this might theoretically work somehow; however, because several of the values in a FAT filesystem header are actually important for the kernel, and at the same time, blkid also performs some sanity checks on superblocks, the PoC takes a different route.
After blkid has read parts of the filesystem and used them to determine the filesystem's type, label and UUID, fsck_msdos and the in-kernel filesystem implementation will re-read the same data, and those repeated reads actually go through to the storage device. The Linux kernel caches block device pages when userspace directly interacts with block devices, but __blkdev_put() removes all cached data associated with a block device when the last open file referencing the device is closed.
A physical attacker can abuse this by attaching a fake storage device that returns different data for multiple reads from the same location. This allows us to present, for example, a romfs header with a long label string to blkid while presenting a perfectly normal vfat filesystem to fsck_msdos and the in-kernel filesystem implementation.
This is relatively simple to implement in practice thanks to Linux' built-in support for device-side USB. Andrzej Pietrasiewicz's talk "Make your own USB gadget" is a useful introduction to this topic. Basically, the kernel ships with implementations for device-side USB mass storage, HID devices, ethernet adapters, and more; using a relatively simple pseudo-filesystem-based configuration interface, you can configure a composite gadget that provides one or multiple of these functions, potentially with multiple instances, to the connected device. The hardware you need is a system that runs Linux and supports device-side USB; for testing this attack, a Raspberry Pi Zero W was used.
The f_mass_storage gadget function is designed to use a normal file as backing storage; to be able to interactively respond to requests from the Android phone, a FUSE filesystem is used as backing storage instead, using the direct_io option / the FOPEN_DIRECT_IO flag to ensure that our own kernel doesn't add unwanted caching.
At this point, it is already possible to implement an attack that can steal, for example, photos stored on external storage. Luckily for an attacker, immediately after a USB stick has been mounted, is launched, which is a process whose SELinux domain permits access to USB devices. So after a malicious FAT partition has been mounted over /data (using the label string 'UUID="../../data'), the zygote forks off a child with appropriate SELinux context and group membership to permit accesses to USB devices. This child then loads bytecode from /data/dalvik-cache/, permitting us to take control over, which has the necessary privileges to exfiltrate external storage contents.
However, for an attacker who wants to access not just photos, but things like chat logs or authentication credentials stored on the device, this level of access should normally not be sufficient on its own.Dealing with SELinux: Triggering the bug twiceThe major limiting factor at this point is that, even though it is possible to mount over /data, a lot of the highly-privileged code running on the device is not permitted to access the mounted filesystem. However, one highly-privileged service does have access to it: vold.
vold actually supports two types of USB sticks, PublicVolume and PrivateVolume. Up to this point, this blogpost focused on PublicVolume; from here on, PrivateVolume becomes important.A PrivateVolume is a USB stick that must be formatted using a GUID Partition Table. It must contain a partition that has type UUID kGptAndroidExpand (193D1EA4-B3CA-11E4-B075-10604B889DCF), which contains a dm-crypt-encrypted ext4 (or f2fs) filesystem. The corresponding key is stored at /data/misc/vold/expand_{partGuid}.key, where {partGuid} is the partition GUID from the GPT table as a normalized lowercase hexstring.
As an attacker, it normally shouldn't be possible to mount an ext4 filesystem this way because phones aren't usually set up with any such keys; and even if there is such a key, you'd still have to know what the correct partition GUID is and what the key is. However, we can mount a vfat filesystem over /data/misc and put our own key there, for our own GUID. Then, while the first malicious USB mass storage device is still connected, we can connect a second one that is mounted as PrivateVolume using the keys vold will read from the first USB mass storage device. (Technically, the ordering in the last sentence isn't entirely correct - actually, the exploit provides both mass storage devices as a single composite device at the same time, but stalls the first read from the second mass storage device to create the desired ordering.)
Because PrivateVolume instances use ext4, we can control DAC ownership and permissions on the filesystem; and thanks to the way a PrivateVolume is integrated into the system, we can even control SELinux labels on that filesystem.
In summary, at this point, we can mount a controlled filesystem over /data, with arbitrary file permissions and arbitrary SELinux contexts. Because we control file permissions and SELinux contexts, we can allow any process to access files on our filesystem - including mapping them with PROT_EXEC.Injecting into zygoteThe zygote process is relatively powerful, even though it is not listed as part of the TCB. By design, it runs with UID 0, can arbitrarily change its UID, and can perform dynamic SELinux transitions into the SELinux contexts of system_server and normal apps. In other words, the zygote has access to almost all user data on the device.
When the 64-bit zygote starts up on system boot, it loads code from /data/dalvik-cache/arm64/system@framework@boot*.{art,oat,vdex}. Normally, the oat file (which contains an ELF library that will be loaded with dlopen()) and the vdex file are symlinks to files on the immutable /system partition; only the art file is actually stored on /data. But we can instead make and system@framework@boot.vdex symlinks to /system (to get around some consistency checks without knowing exactly which Android build is running on the device) while placing our own malicious ELF library at system@framework@boot.oat (with the SELinux context that the legitimate oat file would have). Then, by placing a function with __attribute__((constructor)) in our ELF library, we can get code execution in the zygote as soon as it calls dlopen() on startup.
The missing step at this point is that when the attack is performed, the zygote is already running; and this attack only works while the zygote is starting up.Crashing the systemThis part is a bit unpleasant.
When a critical system component (in particular, the zygote or system_server) crashes (which you can simulate on an eng build using kill), Android attempts to automatically recover from the crash by restarting most userspace processes (including the zygote). When this happens, the screen first shows the boot animation for a bit, followed by the lock screen with the "Unlock for all features and data" prompt that normally only shows up after boot. However, the key material for accessing user data is still present at this point, as you can verify if ADB is on by running "ls /sdcard" on the device.
This means that if we can somehow crash system_server, we can then inject code into the zygote during the following userspace restart and will be able to access user data on the device.
Of course, mounting our own filesystem over /data is very crude and makes all sorts of things fail, but surprisingly, the system doesn't immediately fall over - while parts of the UI become unusable, most places have some error handling that prevents the system from failing so clearly that a restart happens.After some experimentation, it turned out that Android's code for tracking bandwidth usage has a safety check: If the network usage tracking code can't write to disk and >=2MiB (mPersistThresholdBytes) of network traffic have been observed since the last successful write, a fatal exception is thrown. This means that if we can create some sort of network connection to the device and then send it >=2MiB worth of ping flood, then trigger a stats writeback by either waiting for a periodic writeback or changing the state of a network interface, the device will reboot.
To create a network connection, there are two options:
  • Connect to a wifi network. Before Android 9, even when the device is locked, it is normally possible to connect to a new wifi network by dragging down from the top of the screen, tapping the drop-down below the wifi symbol, then tapping on the name of an open wifi network. (This doesn't work for networks protected with WPA, but of course an attacker can make their own wifi network an open one.) Many devices will also just autoconnect to networks with certain names.
  • Connect to an ethernet network. Android supports USB ethernet adapters and will automatically connect to ethernet networks.

For testing the exploit, a manually-created connection to a wifi network was used; for a more reliable and user-friendly exploit, you'd probably want to use an ethernet connection.
At this point, we can run arbitrary native code in zygote context and access user data; but we can't yet read out the raw disk encryption key, directly access the underlying block device, or take a RAM dump (although at this point, half the data that would've been in a RAM dump is probably gone anyway thanks to the system crash). If we want to be able to do those things, we'll have to escalate our privileges a bit more.From zygote to voldEven though the zygote is not supposed to be part of the TCB, it has access to the CAP_SYS_ADMIN capability in the initial user namespace, and the SELinux policy permits the use of this capability. The zygote uses this capability for the mount() syscall and for installing a seccomp filter without setting the NO_NEW_PRIVS flag. There are multiple ways to abuse CAP_SYS_ADMIN; in particular, on the Pixel 2, the following ways seem viable:
  • You can install a seccomp filter without NO_NEW_PRIVS, then perform an execve() with a privilege transition (SELinux exec transition, setuid/setgid execution, or execution with permitted file capability set). The seccomp filter can then force specific syscalls to fail with error number 0 - which e.g. in the case of open() means that the process will believe that the syscall succeeded and allocated file descriptor 0. This attack works here, but is a bit messy.
  • You can instruct the kernel to use a file you control as high-priority swap device, then create memory pressure. Once the kernel writes stack or heap pages from a sufficiently privileged process into the swap file, you can edit the swapped-out memory, then let the process load it back. Downsides of this technique are that it is very unpredictable, it involves memory pressure (which could potentially cause the system to kill processes you want to keep, and probably destroys many forensic artifacts in RAM), and requires some way to figure out which swapped-out pages belong to which process and are used for what. This requires the kernel to support swap.
  • You can use pivot_root() to replace the root directory of either the current mount namespace or a newly created mount namespace, bypassing the SELinux checks that would have been performed for mount(). Doing it for a new mount namespace is useful if you only want to affect a child process that elevates its privileges afterwards. This doesn't work if the root filesystem is a rootfs filesystem. This is the technique used here.

In recent Android versions, the mechanism used to create dumps of crashing processes has changed: Instead of asking a privileged daemon to create a dump, processes execute one of the helpers /system/bin/crash_dump64 and /system/bin/crash_dump32, which have the SELinux label u:object_r:crash_dump_exec:s0. Currently, when a file with such a label is executed by any SELinux domain, an automatic domain transition to the crash_dump domain is triggered (which automatically implies setting the AT_SECURE flag in the auxiliary vector, instructing the linker of the new process to be careful with environment variables like LD_PRELOAD):, crash_dump_exec, crash_dump);
At the time this bug was reported, the crash_dump domain had the following SELinux policy:[...]allow crash_dump {  domain  -init  -crash_dump  -keystore  -logd}:process { ptrace signal sigchld sigstop sigkill };[...]r_dir_file(crash_dump, domain)[...]
This policy permitted crash_dump to attach to processes in almost any domain via ptrace() (providing the ability to take over the process if the DAC controls permit it) and allowed it to read properties of any process in procfs. The exclusion list for ptrace access lists a few TCB processes; but notably, vold was not on the list. Therefore, if we can execute crash_dump64 and somehow inject code into it, we can then take over vold.
Note that the ability to actually ptrace() a process is still gated by the normal Linux DAC checks, and crash_dump can't use CAP_SYS_PTRACE or CAP_SETUID. If a normal app managed to inject code into crash_dump64, it still wouldn't be able to leverage that to attack system components because of the UID mismatch.
If you've been reading carefully, you might now wonder whether we could just place our own binary with context u:object_r:crash_dump_exec:s0 on our fake /data filesystem, and then execute that to gain code execution in the crash_dump domain. This doesn't work because vold - very sensibly - hardcodes the MS_NOSUID flag when mounting USB storage devices, which not only degrades the execution of classic setuid/setgid binaries, but also degrades the execution of files with file capabilities and executions that would normally involve automatic SELinux domain transitions (unless the SELinux policy explicitly opts out of this behavior by granting PROCESS2__NOSUID_TRANSITION).
To inject code into crash_dump64, we can create a new mount namespace with unshare() (using our CAP_SYS_ADMIN capability), then call pivot_root() to point the root directory of our process into a directory we fully control, and then execute crash_dump64. Then the kernel parses the ELF headers of crash_dump64, reads the path to the linker (/system/bin/linker64), loads the linker into memory from that path (relative to the process root, so we can supply our own linker here), and executes it.
At this point, we can execute arbitrary code in crash_dump context and escalate into vold from there, compromising the TCB. At this point, Android's security policy considers us to have kernel-equivalent privileges; however, to see what you'd have to do from here to gain code execution in the kernel, this blogpost goes a bit further.From vold to init contextIt doesn't look like there is an easy way to get from vold into the real init process; however, there is a way into the init SELinux context. Looking through the SELinux policy for allowed transitions into init context, we find the following policy:, init_exec, init)
This means that if we can get code running in kernel context to execute a file we control labeled init_exec, on a filesystem that wasn't mounted with MS_NOSUID, then our file will be executed in init context.
The only code that is running in kernel context is the kernel, so we have to get the kernel to execute the file for us. Linux has a mechanism called "usermode helpers" that can do this: Under some circumstances, the kernel will delegate actions (such as creating coredumps, loading key material into the kernel, performing DNS lookups, ...) to userspace code. In particular, when a nonexistent key is looked up (e.g. via request_key()), /sbin/request-key (hardcoded, can only be changed to a different static path at kernel build time with CONFIG_STATIC_USERMODEHELPER_PATH) will be invoked.
Being in vold, we can simply mount our own ext4 filesystem over /sbin without MS_NOSUID, then call request_key(), and the kernel invokes our request-key in init context.
The exploit stops at this point; however, the following section describes how you could build on it to gain code execution in the kernel.From init context to the kernelFrom init context, it is possible to transition into modprobe or vendor_modprobe context by executing an appropriately labeled file after explicitly requesting a domain transition (note that this is domain_trans(), which permits a transition on exec, not domain_auto_trans(), which automatically performs a transition on exec):
domain_trans(init, { rootfs toolbox_exec }, modprobe)domain_trans(init, vendor_toolbox_exec, vendor_modprobe)
modprobe and vendor_modprobe have the ability to load kernel modules from appropriately labeled files:
allow modprobe self:capability sys_module;allow modprobe { system_file }:system module_load;allow vendor_modprobe self:capability sys_module;allow vendor_modprobe { vendor_file }:system module_load;
Android nowadays doesn't require signatures for kernel modules:
Therefore, you could execute an appropriately labeled file to execute code in modprobe context, then load an appropriately labeled malicious kernel module from there.Lessons learnedNotably, this attack crosses two weakly-enforced security boundaries: The boundary from blkid_untrusted to vold (when vold uses the UUID provided by blkid_untrusted in a pathname without checking that it resembles a valid UUID) and the boundary from the zygote to the TCB (by abusing the zygote's CAP_SYS_ADMIN capability). Software vendors have, very rightly, been stressing for quite some time that it is important for security researchers to be aware of what is, and what isn't, a security boundary - but it is also important for vendors to decide where they want to have security boundaries and then rigorously enforce those boundaries. Unenforced security boundaries can be of limited use - for example, as a development aid while stronger isolation is in development -, but they can also have negative effects by obfuscating how important a component is for the security of the overall system.
In this case, the weakly-enforced security boundary between vold and blkid_untrusted actually contributed to the vulnerability, rather than mitigating it. If the blkid code had run in the vold process, it would not have been necessary to serialize its output, and the injection of a fake UUID would not have worked.
Categories: Security

The Problems and Promise of WebAssembly

Google Project Zero - Thu, 08/16/2018 - 13:02
Posted by Natalie Silvanovich, Project Zero

WebAssembly is a format that allows code written in assembly-like instructions to be run from JavaScript. It has recently been implemented in all four major browsers. We reviewed each browser’s WebAssembly implementation and found three vulnerabilities. This blog post gives an overview of the features and attack surface of WebAssembly, as well as the vulnerabilities we found.Building WebAssembly
A number of tools can be used to write WebAssembly code. An important goal of the designers of the format is to be able to compile C and C++ into WebAssembly, and compilers exist to do so. It is likely that other languages will compile into WebAssembly in the future. It is also possible to write WebAssembly in WebAssembly text format which is a direct text representation of WebAssembly binary format, the final format of all WebAssembly code.WebAssembly Modules
Code in WebAssembly binary format starts off in an ArrayBuffer or TypedArray in JavaScript. It is then loaded into a WebAssembly Module.
var code = new ArrayBuffer(len);… // write code into ArrayBuffervar m = new WebAssembly.Module(code);
A module is an object that contains the code and initialization information specified by the bytes in binary format. When a module is created, it parses the binary, loads needed information into the module, and then translates the WebAssembly instructions into an intermediate bytecode. Verification of the WebAssembly instructions is performed during this translation.
WebAssembly binaries consist of a series of sections (binary blobs) with different lengths and types. The sections supported by WebAssembly binary format are as follows.
SectionCodeDescriptionType1Contains a list of function signatures used by functions defined and called by the module. Each signature has an index, and can be used by multiple functions by specifying that index. Imports2Contains the names and types of objects to be imported. More on this later.Functions3The declarations (including the index of a signature specified in the Type Section) of the functions defined in this module.Table4Contains details about function tables. More on this later.Memory5Contains details about memory. More on this later.Global6Global declarations.Exports7Contains the names and types of objects and functions that will be exported.Start8Specifies a function that will be called on Module start-up.Elements9Table initialization information.Code10The WebAssembly instructions that make up the body of each function.Data11Memory initialization information.
If a section has a code that is not specified in the above table, it is called a custom section. Some browsers use custom sections to implement upcoming or experimental features. Unrecognized custom sections are skipped when loading a Module, and can be accessed as TypedArrays in JavaScript.
Module loading starts off by parsing the module. This involves going through each section, verifying its format and then loading the needed information into a native structure inside the WebAssembly engine. Most of the bugs that Project Zero found in WebAssembly occured in this phase.
To start, CVE-2018-4222 occurs when the WebAssembly binary is read out of the buffer containing it. TypedArray objects in JavaScript can contain offsets at which their underlying ArrayBuffers are accessed. The WebKit implementation of this added the offset to the ArrayBuffer data pointer twice. So the following code:
var b2 = new ArrayBuffer(1000);
var view = new Int8Array(b2, 700); // offset
var mod = new WebAssembly.Module(view);
Will read memory out-of-bounds in an unfixed version of WebKit. Note that this is also a functional error, as it prevents any TypedArray with an offset from being processed correctly by WebAssembly.
CVE-2018-6092 in Chrome is an example of an issue that occurs when parsing a WebAssembly buffer. Similar issues have been fixed in the past. In this vulnerability, there is an integer overflow when parsing the locals of a function specified in the code section of the binary. The number of locals of each type are added together, and the size_t that contains this number can wrap on a 32-bit platform.
It is also evident from the section table above (and specified in the WebAssembly standard) that sections must be unique and in the correct order. For example, the function section can’t load unless the type section containing the signatures it needs has been loaded already.   CVE-2018-4121 is an error in section order checking in WebKit. In unfixed versions of WebKit, the order check gets reset after a custom section is processed, basically allowing sections to occur any number of times in any order. This leads to an overflow in several vectors in WebKit, as its parsing implementation allocates memory based on the assumption that there is only one of each section, and then adds elements to the memory without checking. Even without this implementation detail, though, this bug would likely lead to many subtle memory corruption issues in the WebAssembly engine, as the order and non-duplicate nature of WebAssembly binary sections is very fundamental to the functionality of WebAssembly.
This vulnerability was independently discovered by Alex Plaskett, Fabian Beterke and Georgi Geshev of MWR Labs, and they describe their exploit here.WebAssembly Instances
After a binary is loaded into a Module, an Instance of the module needs to be created to run the code. An Instance binds the code to imported objects it needs to run, and does some final initialization.
var code = new ArrayBuffer(len);… // write code into ArrayBuffervar m = new WebAssembly.Module(code);var i = new WebAssembly.Instance(m, imports);
Each module has an Import Section it loaded from the WebAssembly binary. This section contains the names and types of objects that must be imported from JavaScript for the code in the module to run. There are four types of object that can be imported. Functions (JavaScript or WebAssembly) can be imported and called from WebAssembly. Numeric types can also be imported from JavaScript to populate globals.
Memory and Table objects are the final two types that can be imported. These are new object types added to JavaScript engines for use in WebAssembly. Memory objects contain the memory used by the WebAssembly code. This memory can be accessed in JavaScript via an ArrayBuffer, and in WebAssembly via load and store instructions. When creating a Memory object, the WebAssembly developer specifies the initial and optional maximum size of the memory. The Memory object is then created with the initial memory size allocated, and the allocated memory size can be increased in JavaScript by calling the grow method, and in WebAssembly using the grow instruction. Memory size can never decrease (at least according to the standard).
Table objects are function tables for WebAssembly. They contain function objects at specific indexes in the table, and these functions can be called from WebAssembly using the call_indirect instruction. Like memory, tables have an initial and optional maximum size, and their size can be expanded by calling the grow method in JavaScript. Table objects cannot be expanded in WebAssembly.  Table objects can only contain WebAssembly functions, not JavaScript functions, and an exception is thrown if the wrong type of function is added to a Table object. Currently, WebAssembly only supports one Memory object and one Table object per Instance object. This is likely to change in the future though.
More than one Instance object can share the same Memory object and Table object. If two or more Instance objects share both of these objects, they are referred to as being in the same compartment. It is possible to create Instance objects that share a Table object, but not a Memory object, or vice versa, but no compiler should ever create Instances with this property. No compiler ever changes the values in a table after it is initialized, and this is likely to remain true in the future, but it is still possible for JavaScript callers to change them at any time.
There are two ways to add Memory and Table objects to an Instance object. The first is through the Import Section as mentioned above. The second way is to include a Memory or Table Section in the binary. Including these sections causes the WebAssembly engine to create the needed Memory or Table object for the module, with parameters provided in the binary. It is not valid to specify these objects in both the Import Section and the Table or Memory Section, as this would mean there is more than one of each object, which is not currently allowed. Memory and Table objects are not mandatory, and it is fairly common for code in WebAssembly not to have a Table object. It is also possible to create WebAssembly code that does not have a Memory object, for example a function that averages the parameters that are passed in, but this is rare in practice.
One feature of these objects that has led to several vulnerabilities is the ability to increase the size of the allocated Memory or Table object. For example, CVE-2018-5093, a series of integer overflow vulnerabilities in increasing the size of Memory and Table objects was recently found by OSS-Fuzz. A similar issue was found in Chrome by OSS-Fuzz.
Another question that immediately comes to mind about Memory objects is whether the internal ArrayBuffer can be detached, as many vulnerabilities have occured in ArrayBuffer detachment. According to the specification, Memory object ArrayBuffers cannot be detached by script, and this is true in all browsers except for Microsoft Edge (Chakra does not allow this, but Edge does). The Memory object ArrayBuffer also do not change size when the Memory object is expanded. Instead, they are detached as soon as the grow method is called. This prevents any bugs that could occur due to ArrayBuffers changing size.
Out of bounds access is always a concern when allowing script to use memory, but these types of issues are fairly uncommon in WebAssembly. One likely reason for this is that a limited number of WebAssembly instructions can access memory, and WebAssembly currently only supports a single page of memory, so the code that accesses memory is a WebAssembly engine is actually quite small. Also, on 64-bit systems, WebAssembly implements memory as safe buffers (also called signal buffers). To understand how safe buffers work, it is important to understand how loads and stores work in WebAssembly. These instructions have two operands, an address and an offset. When memory is accessed, these two operands are added to the pointer to the start of the internal memory of the Memory object, and the resulting location is where the memory access happens. Since both of these operands are 32-bit integers (note that this is likely to change in future versions of WebAssembly), and required to be above zero, a memory access can be at most 0xfffffffe (4GB) outside of the allocated buffer.
Safe buffers work by mapping 4GB into memory space, and then allocating the portion of memory that is actually needed by WebAssembly code as RW memory at the start of the mapped address space. Memory accesses can be at most 4GB from the start of the memory buffer, so all accesses should be in this range. Then, if memory is accessed outside of the allocated memory, it will cause a signal (or equivalent OS error), which is then handled by the WebAssembly engine, and an appropriate out of bounds exception is then thrown in JavaScript. Safe buffers eliminate the need for bounds checks in code, making vulnerabilities due to out-of-bounds access less likely on 64-bit systems. Explicit bounds checking is still required on 32-bit systems, but these are becoming less common.
After the imported objects are loaded, the WebAssembly engine goes through a few more steps to create the Instance Object. The Elements Section of the WebAssembly binary is used to initialize the Table object, if both of these exist, and then the Data Section of the WebAssembly binary is used to initialize the Memory object, if both exist. Then, the code in the Module is used to create functions, and these functions are exported (attached to a JavaScript object, so they are accessible in JavaScript). Finally, if a start function is specified in the Start Section, it is executed, and then the WebAssembly is ready to run!
var b2 = new ArrayBuffer(1000);
var view = new Int8Array(b2, 700); // offset
var mod = new WebAssembly.Module(a);var i = new WebAssembly.Instance(m, imports);i.exports.call_me(); //WebAssembly happens!
The final issue we found involves a number of these components. It was discovered and fixed by the Chrome team before we found it, so it doesn’t have a CVE, but it’s still an interesting bug.
This issue is related to the call_indirect instruction which calls a function in the Table object. When the function in the Table object is called, the function can remove itself from the Table object during the call. Before this issue was fixed, Chrome relied on the reference to the function in the Table object to prevent it from being freed during garbage collection. So removing the function from the Table object during a call has the potential to cause the call to use freed memory when it unwinds.
This bug was originally fixed by preventing a Table object from being changed in JavaScript when a WebAssembly call was in progress. Unfortunately, this fix did not completely resolve the issue. Since it is possible to create a WebAssembly Instance in any function, it was still possible to change the Table object by creating an Instance that imports the Table object and has an underlying module with an Elements Section. When the new Instance is created, the Elements Section is used to initialize the Table, allowing the table to be changed without calling the JavaScript function to change a Table object. The issue was ultimately resolved by holding an extra reference to all needed objects for the duration of the call.Execution
WebAssembly is executed by calling an exported function. Depending on the engine, the intermediate bytecode generated when the Module was parsed is either interpreted or used to generate native code via JIT. It’s not uncommon for WebAssembly engines to have bugs where the wrong code is generated for certain sequences of instructions; many such issues have been reported in the bugs trackers for the different engines. We didn’t see any such bugs that had a clear security impact though.The Future
Overall, the majority of the bugs we found in WebAssembly were related to the parsing of WebAssembly binaries, and this has been mirrored in vulnerabilities reported by other parties. Also, compared to other recent browser features, surprisingly few vulnerabilities have been reported in it. This is likely due to the simplicity of the current design, especially with regards to memory management.
There are two emerging features of WebAssembly that are likely to have a security impact. One is threading. Currently, WebAssembly only supports concurrency via JavaScript workers, but this is likely to change. Since JavaScript is designed assuming that this is the only concurrency model, WebAssembly threading has the potential to require a lot of code to be thread safe that did not previously need to be, and this could lead to security problems.
WebAssembly GC is another potential feature of WebAssembly that could lead to security problems. Currently, some uses of WebAssembly have performance problems due to the lack of higher-level memory management in WebAssembly. For example, it is difficult to implement a performant Java Virtual Machine in WebAssembly. If WebAssembly GC is implemented, it will increase the number of applications that WebAssembly can be used for, but it will also make it more likely that vulnerabilities related to memory management will occur in both WebAssembly engines and applications written in WebAssembly.
Categories: Security

Windows Exploitation Tricks: Exploiting Arbitrary Object Directory Creation for Local Elevation of Privilege

Google Project Zero - Tue, 08/14/2018 - 13:00
Posted by James Forshaw, Project Zero
And we’re back again for another blog in my series on Windows Exploitation tricks. This time I’ll detail how I was able to exploit Issue 1550 which results in an arbitrary object directory being created by using a useful behavior of the CSRSS privileged process. Once again by detailing how I’d exploit a particular vulnerability I hope that readers get a better understanding of the complexity of the Windows operating system as well as giving Microsoft information on non-memory corruption exploitation techniques so that they can mitigate them in some way.Quick Overview of the VulnerabilityObject Manager directories are unrelated to normal file directories. The directories are created and manipulated using a separate set of system calls such as NtCreateDirectoryObject rather than NtCreateFile. Even though they’re not file directories they’re vulnerable to many of the same classes of issues as you’d find on a file system including privileged creation and symbolic link planting attacks.
Issue 1550 is a vulnerability that allows the creation of a directory inside a user-controllable location while running as SYSTEM. The root of the bug is in the creation of Desktop Bridge applications. The AppInfo service, which is responsible for creating the new application, calls the undocumented API CreateAppContainerToken to do some internal housekeeping. Unfortunately this API creates object directories under the user’s AppContainerNamedObjects object directory to support redirecting BaseNamedObjects and RPC endpoints by the OS.
As the API is called without impersonating the user (it’s normally called in CreateProcess where it typically isn’t as big an issue) the object directories are created with the identity of the service, which is SYSTEM. As the user can write arbitrary objects to their AppContainerNamedObjects directory they could drop an object manager symbolic link and redirect the directory creation to almost anywhere in the object manager namespace. As a bonus the directory is created with an explicit security descriptor which allows the user full access, this will become very important for exploitation.
One difficulty in exploiting this vulnerability is that if the object directory isn’t created under AppContainerNamedObjects because we’ve redirected its location then the underlying NtCreateLowBoxToken system call which performs the token creation and captures a handle to the directory as part of its operation will fail. The directory will be created but almost immediately deleted again. This behavior is actually due to an earlier issue I reported which changes the system call’s behavior. This is still exploitable by opening a handle to the created directory before it’s deleted, and in practice it seems winning this race is reliable as long as your system has multiple processors (which is basically any modern system). With an open handle the directory is kept alive as long as needed for exploitation.
This is the point where the original PoC I sent to MSRC stopped, all the PoC did was create an arbitrary object directory. You can find this PoC attached to the initial bug report in the issue tracker. Now let’s get into how we might exploit this vulnerability to go from a normal user account to a privileged SYSTEM account.ExploitationThe main problem for exploitation is finding a location in which we can create an object directory which can then be leveraged to elevate our privileges. This turns out to be harder than you might think. While almost all Windows applications use object directories under the hood, such as BaseNamedObjects, the applications typically interact with existing directories which the vulnerability can’t be used to modify.
An object directory that would be interesting to abuse is KnownDlls (which I mentioned briefly in the previous blog in this series). This object directory contains a list of named image section objects, of the form NAME.DLL. When an application calls LoadLibrary on a DLL inside the SYSTEM32 directory the loader first checks if an existing image section is present inside the KnownDlls object directory, if the section exists then that will be loaded instead of creating a new section object.

KnownDlls is restricted to only being writable by administrators (not strictly true as we’ll see) because if you could drop an arbitrary section object inside this directory you could force a system service to load the named DLL, for example using the Diagnostics Hub service I described in my last blog post, and it would map the section, not the file on disk. However the vulnerability can’t be used to modify the KnownDlls object directory other than adding a new child directory which doesn’t help in exploitation. Maybe we can target KnownDlls indirectly by abusing other functionality which our vulnerability can be used with?
Whenever I do research into particular areas of a product I will always note down interesting or unexpected behavior. One example of interesting behavior I discovered when I was researching Windows symbolic links. The Win32 APIs support a function called DefineDosDevice, the purpose of this API is to allow a user to define a new DOS drive letter. The API takes three parameters, a set of flags, the drive prefix (e.g. X:) to create and the target device to map that drive to. The API’s primary use is in things like the CMD SUBST command.
On modern versions of Windows this API creates an object manager symbolic link inside the user’s own DOS device object directory, a location which can be written to by a normal low privileged user account. However if you look at the implementation of DefineDosDevice you’ll find that it’s not implemented in the caller’s process. Instead the implementation calls an RPC method inside the current session’s CSRSS service, specifically the method BaseSrvDefineDosDevice inside BASESRV.DLL. The main reason for calling into a privileged service is it allows a user to create a permanent symbolic link which doesn’t get deleted when all handles to the symbolic link object are closed. Normally to create a permanent named kernel object you need the SeCreatePermanentPrivilege privilege, however a normal user does not have that privilege. On the other hand CSRSS does, so by calling into that service we can create the permanent symbolic link.
The ability to create a permanent symbolic link is certainly interesting, but if we were limited to only creating drive letters in the user’s DOS devices directory it wouldn’t be especially useful. I also noticed that the implementation never verified that the lpDeviceName parameter is a drive letter. For example you could specify a name of “GLOBALROOT\RPC Control\ABC” and it would actually create a symbolic link outside of the user’s DosDevices directory, specifically in this case the path “\RPC Control\ABC”. This is because the implementation prepends the DosDevice prefix “\??” to the device name and passes it to NtCreateSymbolicLink. The kernel would follow the full path, finding GLOBALROOT which is a special symbolic link to return to the root and then follow the path to creating the arbitrary object. It was unclear if this was intentional behavior so I looked in more depth at the implementation in CSRSS, which is shown in abbreviated form below.
NTSTATUS BaseSrvDefineDosDevice(DWORD dwFlags,
                               LPCWSTR lpDeviceName,
                               LPCWSTR lpTargetPath) {
   WCHAR device_name[];
   snwprintf_s(device_name, L"\\??\\%s", lpDeviceName);
   UNICODE_STRING device_name_ustr;
   RtlInitUnicodeString(&device_name_ustr, device_name);
   InitializeObjectAttributes(&objattr, &device_name_ustr,                               OBJ_CASE_INSENSITIVE);

   BOOLEAN enable_impersonation = TRUE;
   HANDLE handle;
   NTSTATUS status = NtOpenSymbolicLinkObject(&handle, DELETE, &objattr);①

   if (NT_SUCCESS(status)) {
       BOOLEAN is_global = FALSE;

       // Check if we opened a global symbolic link.
       IsGlobalSymbolicLink(handle, &is_global); ②
       if (is_global) {
           enable_impersonation = FALSE; ③
           snwprintf_s(device_name, L"\\GLOBAL??\\%s", lpDeviceName);
           RtlInitUnicodeString(&device_name_ustr, device_name);

       // Delete the existing symbolic link.

   if (enable_impersonation) { ④

   // Create the symbolic link.
   UNICODE_STRING target_name_ustr;
   RtlInitUnicodeString(&target_name_ustr, lpTargetPath);

   status = NtCreateSymbolicLinkObject(&handle, MAXIMUM_ALLOWED,                                objattr, target_name_ustr); ⑤

   if (enable_impersonation) { ⑥
   if (NT_SUCCESS(status)) {
       status = NtMakePermanentObject(handle); ⑦
   return status;
We can see the first thing the code does is build the device name path then try and open the symbolic link object for DELETE access ①. This is because the API supports redefining an existing symbolic link, so it must first try to delete the old link. If we follow the default path where the link doesn’t exist we’ll see the code impersonates the caller (the low privileged user in this case) ④ then creates the symbolic link object ⑤, reverts the impersonation ⑥ and makes the object permanent ⑦ before returning the status of the operation. Nothing too surprising, we can understand why we can create arbitrary symbolic links because all the code does is prefix the passed device name with “\??”. As the code impersonates the caller when doing any significant operation we can only create the link in a location that the user could already write to.
What’s more interesting is the middle conditional, where the target symbolic link is opened for DELETE access, which is needed to call NtMakeTemporaryObject. The opened handle is passed to another function ②, IsGlobalSymbolicLink, and based on the result of that function a flag disabling impersonation is set and the device name is recreated again with the global DOS device location \GLOBAL?? as the prefix ③. What is IsGlobalSymbolicLink doing? Again we can just RE the function and check.
void IsGlobalSymbolicLink(HANDLE handle, BOOLEAN* is_global) {
   BYTE buffer[0x1000];
   NtQueryObject(handle, ObjectNameInformation, buffer, sizeof(buffer));
   RtlInitUnicodeString(&prefix, L"\\GLOBAL??\\");
   // Check if object name starts with \GLOBAL??
   *is_global = RtlPrefixUnicodeString(&prefix, (PUNICODE_STRING)buffer);
The code checks if the opened object’s name starts with \GLOBAL??\. If so it sets the is_global flag to TRUE. This results in the flag enabling impersonation being cleared and the device name being rewritten. What this means is that if the caller has DELETE access to a symbolic link inside the global DOS device directory then the symbolic link will be recreated without any impersonation, which means it will be created as the SYSTEM user. This in itself doesn’t sound especially interesting as by default only an administrator could open one of the global symbolic links for DELETE access. However, what if we could create a child directory underneath the global DOS device directory which could be written to by a low privileged user? Any symbolic link in that directory could be opened for DELETE access as the low privileged user could specify any access they liked, the code would flag the link as being global, when in fact that’s not really the case, disable impersonation and recreate it as SYSTEM. And guess what, we have a vulnerability which would allow us to create an arbitrary object directory under the global DOS device directory.
Again this might not be very exploitable if it wasn’t for the rewriting of the path. We can abuse the fact that the path “\??\ABC” isn’t the same as “\GLOBAL??\ABC” to construct a mechanism to create an arbitrary symbolic link anywhere in the object manager namespace as SYSTEM. How does this help us? If you write a symbolic link to KnownDlls then it will be followed by the kernel when opening a section requested by DLL loader. Therefore even though we can’t directly create a new section object inside KnownDlls, we can create a symbolic link which points outside that directory to a place that the low-privileged user can create the section object. We can now abuse the hijack to load an arbitrary DLL into memory inside a privileged process and privilege elevation is achieved.
Pulling this all together we can exploit our vulnerability using the following steps:
  1. Use the vulnerability to create the directory “\GLOBAL??\KnownDlls”
  2. Create a symbolic link inside the new directory with the name of the DLL to hijack, such as TAPI32.DLL. The target of this link doesn’t matter.
  3. Inside the user’s DOS device directory create a new symbolic link called “GLOBALROOT” pointing to “\GLOBAL??”. This will override the real GLOBALROOT symbolic link object when a caller accesses it via the user’s DOS device directory.
  4. Call DefineDosDevice specifying a device name of “GLOBALROOT\KnownDlls\TAPI32.DLL” and a target path of a location that the user can create section objects inside. This will result in the following operations:
    1. CSRSS opens the symbolic link “\??\GLOBALROOT\KnownDlls\TAPI32.DLL” which results in opening “\GLOBAL??\KnownDlls\TAPI32.DLL”. As this is controlled by the user the open succeeds, and the link is considered global which disables impersonation.
    2. CSRSS rewrites the path to “\GLOBAL??\GLOBALROOT\KnownDlls\TAPI32.DLL” then calls NtCreateSymbolicLinkObject without impersonation. This results in following the real GLOBALROOT link, which results in creating the symbolic link “\KnownDlls\TAPI32.DLL” with an arbitrary target path.
  5. Create the image section object at the target location for an arbitrary DLL, then force it to be loaded into a privileged service such as the Diagnostics Hub by getting the service to call LoadLibrary with a path to TAPI32.DLL.
  6. Privilege escalation is achieved.

Abusing the DefineDosDevice API actually has a second use, it’s an Administrator to Protected Process Light (PPL) bypass. PPL processes still use KnownDlls, so if you can add a new entry you can inject code into the protected process. To prevent that attack vector Windows marks the KnownDlls directory with a Process Trust Label which blocks all but the highest level level PPL process from writing to it, as shown below.

How does our exploit work then? CSRSS actually runs as the highest level PPL so is allowed to write to the KnownDlls directory. Once the impersonation is dropped the identity of the process is used which will allow full access.
If you want to test this exploit I’ve attached the new PoC to the issue tracker here.Wrapping UpYou might wonder at this point if I reported the behavior of DefineDosDevice to MSRC? I didn’t, mainly because it’s not in itself a vulnerability. Even in the case of Administrator to PPL, MSRC do not consider that a serviceable security boundary (example). Of course the Windows developers might choose to try and change this behavior in the future, assuming it doesn’t cause a major regression in compatibility. This function has been around since the early days of Windows and the current behavior since at least Windows XP so there’s probably something which relies on it. By describing this exploit in detail, I want to give MS as much information as necessary to address the exploitation technique in the future.
I did report the vulnerability to MSRC and it was fixed in the June 2018 patches. How did Microsoft fix the vulnerability? The developers added a new API, CreateAppContainerTokenForUser which impersonates the token during creation of the new AppContainer token. By impersonating during token creation the code ensures that all objects are created only with the privileges of the user. As it’s a new API existing code would have to be changed to use it, therefore there’s a chance you could still find code which uses the old CreateAppContainerToken in a vulnerable pattern.
Exploiting vulnerabilities on any platform sometimes requires pretty in-depth knowledge about how different components interact. In this case while the initial vulnerability was clearly a security issue, it’s not clear how you could proceed to full exploitation. It’s always worth keeping a log of interesting behavior which you encounter during reverse engineering as even if something is not a security bug itself, it might be useful to exploit another vulnerability.
Categories: Security

Adventures in vulnerability reporting

Google Project Zero - Thu, 08/02/2018 - 14:56
Posted by Natalie Silvanovich, Project Zero

At Project Zero, we spend a lot of time reporting security bugs to vendors. Most of the time, this is a fairly straightforward process, but we occasionally encounter challenges getting information about vulnerabilities into the hands of vendors. Since it is important to user security that software vendors fix reported vulnerabilities in a timely matter, and vendors need to actually receive the report for this to happen, we have decided to share some of our experiences. We hope to show that good practices by software vendors can avoid delays in vulnerability reporting.
Effective Vulnerability Reporting ProcessesThere are several aspects of a bug reporting process that make reporting vulnerabilities easier from the bug reporter’s perspective. To start off, it’s important for a bug reporting process to be easy to find and use. We sometimes have difficulty figuring out how to report a vulnerability in a piece of software if the vulnerability reporting process is not documented on the project or vendor’s website, or if outdated material is not removed and instructions for reporting vulnerabilities are inconsistent. This can lead to delays in reporting. Effective vulnerability reporting processes are clearly documented, and the documentation is easy to find.
We also appreciate when the process for reporting a vulnerability is short and straightforward. Occasionally, we report dozens of vulnerabilities in a vendor’s products, and it is helpful when reporting does not require a lot of clicks and reading. Reporting processes that use email or bug trackers are usually the easiest, though webforms can be easy if they are not excessively long. While Project Zero will always report a vulnerability, even if reporting it is very time consuming, this is not necessarily the case for other bug reporters. Long bug reporting processes can cause bug reporters to report bugs more slowly, spend less time working on a piece of software or even give up on reporting a bug. The easier a bug reporting process is, the more likely it is that someone will go through with it.
It’s also important for bug reporting processes to be well-tested. While the majority we encounter are, we’ve occasionally had bug reporting email addresses bounce, webforms reject necessary information (like the reporter’s name) and security issues go unnoticed in bug trackers for months despite following the documented process. Vendors with good processes usually test that their process and any systems it involves works correctly on a regular basis.
Mandatory legal agreements in the reporting process are another problem that we encounter every so often. If a legal agreement contains language about disclosure or any other subject we don’t feel comfortable entering an agreement about on behalf of our company, deciding whether to enter the agreement can require a lengthy discussion, delaying the bug report. While legal agreements are sometimes necessary for rewards programs and code contributions, good vulnerability reporting processes allow bug reporters to report bugs without them.
It is also helpful when vendors confirm that vulnerability reports have been received in a timely manner. Since bug reports can get lost for a number of reasons, including bugs in the reporting interface and human error, it is a good idea to let reporters know that their report has been received, even if it won’t be processed right away. This lets the reporter know that they’ve reported the bug correctly, and don’t need to spend any more time reporting it, and makes it more likely that bug reporters will reach out if a bug report gets lost, as they will be expecting a confirmation.
Finally, even if good practices are followed in creating the bug reporting process, it is still possible that a bug reporting process has problems, so it is very helpful if vendors provide a way to give feedback on the process. It’s very rare for vendors to intentionally make bug reporting difficult, but unexpected problems happen fairly frequently, so it is good to provide a way bug reporters can reach out for help as a last resort if a reporting a bug fails for any reason.
ExamplesOne example of a bug we had difficulty reporting due to a vendor not following the practices described above is CVE-2018-10751.  CVE-2018-10751 is a remote memory corruption vulnerability in OMACP affecting the Samsung S7 Edge. The issue can be triggered by sending a single SMS to the target device, and does not require any user interaction. The payload can be sent from an app on an Android device without root access or any special equipment. It is similar to CVE-2016-7990, which is described in detail here.
Samsung’s Vulnerability Reporting ProcessCVE-2018-10751 is a serious vulnerability, and I wanted to report it immediately. I started off by reading Samsung Mobile’s Security Reporting page. This page has a button to create a bug report. Accessed February 22, 2018
Pressing the button led to a sign-up page. I didn’t have a Samsung account, so I tried to sign up. Unfortunately, it led to this page: Accessed February 22, 2018
Not speaking Korean, I wasn’t sure what to do here. I eventually went back to the previous page and tried the ‘Sign-in’ button.
This brought me to an English sign-up page, which then brought me to the account creation page. According to this page, I had to read and agree to some terms. Clicking the links led to over twenty separate agreements, most of which had nothing to do with vulnerability reporting. Accessed February 22, 2018
That’s a lot of text to read and review. Let’s just say I skimmed a bit. Once I clicked ‘Agree’, I was taken to a page where I could enter account information. The page required my birthdate and zip code, which I wasn’t thrilled to have to provide to report a vulnerability, but I wanted to get the issue reported, so I entered them. Finally, my account was created! I logged in, hoping to start reporting the bug, only to be greeted with more conditions. Accessed February 22, 2018
These ones were in Korean, and I couldn’t figure out how to change the language. Eventually, I just selected confirm. Finally, I got to the form where I could report bugs! Accessed February 22, 2018
I filled out the vulnerability information, and scrolled down, and there was one more set of terms to agree to: Accessed February 22, 2018
These terms included:
- You MUST hold off disclosing the vulnerability in reasonable time, and you MUST get Samsung’s  consent or inform Samsung about the date before disclosing the vulnerability.- In some cases, Samsung may request not to disclose the vulnerability at all.
I was not able to submit this form without agreeing to allow Samsung some level of control over disclosure of reported vulnerability. I looked around Samsung’s security page to see if they provided an email address I could report the issue to, but they did not provide one. I was not comfortable reporting this bug through the mechanisms Samsung provides for vulnerability reporting on their website.
Problems with Vulnerability Reporting Processes
I encountered several problems while trying to report the above vulnerability—most of which have been since resolved by Samsung.
To start off, Samsung’s bug reporting process did not seem adequately tested. The many times that Korean text showed up while attempting to report this vulnerability suggests that it was not tested in English. As described above, is important for vendors to test vulnerability reporting processes, including for internationalization issues. The workflow is also excessively long, and requires the reporter to agree to a very large number of agreements, many of which have nothing to do with vulnerability reports. I suspect that the people testing this interface might have already had accounts, and not seen how long the process is for someone who just wants to report a bug.
This isn’t an uncommon problem. The Android security reporting template requires creating a GMail account, which can require clicking through many screens and verification via SMS in some circumstances. As a result of our feedback, the Android Security team has improved the documentation that vulnerability reports can be filed via email (, although using the web form is still required to participate in the Android Security rewards program.
Another problem was that in order to report a bug, a reporter had to agree to the terms of the rewards program. This is an issue that Project Zero has been seeing increasingly often. When software vendors start rewards programs, they often remove existing mechanisms for reporting vulnerabilities, leaving bug reporters with no way to report vulnerabilities without entering into agreements.
This also occurred when Tavis Ormandy attempted to report the vulnerability he reluctantly dubbed CloudBleed. Cloudflare’s vulnerability reporting process is tied to its rewards program with HackerOne, and there is no clear way to report a vulnerability without creating a HackerOne account in their Vulnerability Disclosure Policy. The policy even states “We agree with their disclosure philosophy, and if you do too, please submit your vulnerability reports here” without providing an alternative for vulnerability reporters who don’t agree or don’t want to participate in the program for whatever reason. In Project Zero’s case, our disclosure deadline is 90 days meanwhile HackerOne’s deadline is 180 days. This vulnerability was also very urgent as it was actively leaking user data onto the Internet, and we didn’t want to delay reporting the issue while we read through HackerOne’s terms to determine whether they were compatible with our disclosure policy.
We find that vendors generally don’t intend to prevent bug reports from anyone who won’t agree to their disclosure rules, but this was the end result of Samsung and Cloudflare replacing their bug reporting process with a rewards program.
The specific terms of Samsung’s agreement were also fairly vague. In particular, it wasn’t clear what the consequences of breaking the terms would be. For example:
- You MUST hold off disclosing the vulnerability in reasonable time, and you MUST get Samsung’s  consent or inform Samsung about the date before disclosing the vulnerability.
Does this mean that if someone discloses a vulnerability without permission, they are not eligible for a reward? Does it mean that if someone discloses the vulnerability without permission, Samsung can take legal action against them? While requiring that bug reporters not disclose vulnerabilities to receive rewards is a policy with debatable benefit, I would have been much more comfortable agreeing to these terms if they had spelled out that violating them would simply mean I would not receive a reward, as opposed to other legal consequences. Overall, the issues of poorly tested bug reporting interfaces and requiring legal agreements to report vulnerabilities have come up multiple times, and led to delays of Project Zero reporting vulnerabilities. We recommend that vendors test their vulnerability reporting interfaces from the perspective of someone who’s never reported a bug from outside of their corporate network, and make sure to do localized testing. It is also important to allow bug reports without requiring the reporter to enter into excessive legal agreements.
While only accepting vulnerability reports via web forms can reduce the number of invalid reports, which is a major challenge for teams accepting vulnerability reports, they can also be unreliable and prevent vulnerability reporting in situations that were not expected by those designing them, unless they are very well tested. Having an alternate email address that vulnerability reporters can use to report bugs if they encounter problems is a good way to prevent this type of problem.
Reporting the BugI eventually contacted some members of the Knox security team at Samsung that I had worked with on previous bugs and they recommended reporting the issue to This email is not documented on the Samsung website, except for a single blog post from 2015.
The difficulty I encountered reporting this serious vulnerability delayed my report one week. It might have caused a longer delay if I did not have contacts at Samsung who could help.
Samsung started rolling out updates for CVE-2018-10751 (Samsung’s identifier SVE-2018-11463) in their April maintenance release.
Samsung has updated their account creation page so that it always displays English text if the language is set to English. Also, the vulnerability report form can now be submitted without agreeing to the terms for the Samsung’s rewards program, though the user still has to agree to two other agreements. They have also updated their bug reporting page to provide an email address as well as a webform. We appreciate the changes they have made to make reporting vulnerabilities in Samsung products easier for everyone.
ConclusionProject Zero has occasionally had difficulty reporting vulnerabilities, leading to delays in reporting the bug. Usually, these are due to problems in the reporting process that were not intended or expected by the vendor. A difficult vulnerability reporting process can have a negative impact on user security due to delays in vulnerability reports, lost vulnerability reports and even bug reporters choosing not to report a vulnerability. We appreciate when vendors do the following to make their bug reporting processes easier for bug reporters:
  • Vendors should regularly test their vulnerability reporting interfaces in all supported languages
  • Vendors should streamline their vulnerability reporting processing as much as possible, and remove excessive clicks and legal agreements
  • Vendors should regularly solicit feedback on their vulnerability reporting mechanisms from vulnerability reporters and people they think are likely to report vulnerabilities
Categories: Security

Drawing Outside the Box: Precision Issues in Graphic Libraries

Google Project Zero - Thu, 07/26/2018 - 12:47
By Mark Brand and Ivan Fratric, Google Project Zero
In this blog post, we are going to write about a seldom seen vulnerability class that typically affects graphic libraries (though it can also occur in other types of software). The root cause of such issues is using limited precision arithmetic in cases where a precision error would invalidate security assumptions made by the application.
While we could also call other classes of bugs precision issues, namely integer overflows, the major difference is: with integer overflows, we are dealing with arithmetic operations where the magnitude of the result is too large to be accurately represented in the given precision. With the issues described in this blog post, we are dealing with arithmetic operations where the magnitude of the result or a part of the result is too small to be accurately represented in the given precision.
These issues can occur when using floating-point arithmetic in operations where the result is security-sensitive, but, as we’ll demonstrate later, can also occur in integer arithmetic in some cases.
Let’s look at a trivial example:
 float a = 100000000;  float b = 1;  float c = a + b;
If we were making the computation with arbitrary precision, the result would be 100000001. However, since float typically only allows for 24 bits of precision, the result is actually going to be 100000000. If an application makes the normally reasonable assumption that a > 0 and b > 0 implies that a + b > a, then this could lead to issues.
In the example above, the difference between a and b is so significant that b completely vanishes in the result of the calculation, but precision errors also happen if the difference is smaller, for example
 float a = 1000;  float b = 1.1111111;  float c = a + b;
The result of the above computation is going to be 1001.111084 and not 1001.1111111 which would be the accurate result. Here, only a part of b is lost, but even such results can sometimes have interesting consequences.
While we used the float type in the above examples, and in these particular examples using double would result in more accurate computation, similar precision errors can happen with double as well.
In the remainder of this blog post, we are going to show several examples of precision issues with security impact. These issues were independently explored by two Project Zero members: Mark Brand, who looked at SwiftShader, a software OpenGL implementation used in Chrome, and Ivan Fratric, who looked at the Skia graphics library, used in Chrome and Firefox. SwiftShaderSwiftShader is “a high-performance CPU-based implementation of the OpenGL ES and Direct3D 9 graphics APIs”. It’s used in Chrome on all platforms as a fallback rendering option to work around limitations in graphics hardware or drivers, allowing universal use of WebGL and other advanced javascript rendering APIs on a far wider range of devices.
The code in SwiftShader needs to handle emulating a wide range of operations that would normally be performed by the GPU. One operation that we commonly think of as essentially “free” on a GPU is upscaling, or drawing from a small source texture to a larger area, for example on the screen. This requires computing memory indexes using non-integer values, which is where the vulnerability occurs.
As noted in the original bug report, the code that we’ll look at here is not quite the code which is actually run in practice - SwiftShader uses an LLVM-based JIT engine to optimize performance-critical code at runtime, but that code is more difficult to understand than their fallback implementation, and both contain the same bug, so we’ll discuss the fallback code. This code is the copy-loop used to copy pixels from one surface to another during rendering:
 source->lockInternal((int)sRect.x0, (int)sRect.y0, sRect.slice, sw::LOCK_READONLY, sw::PUBLIC);
 dest->lockInternal(dRect.x0, dRect.y0, dRect.slice, sw::LOCK_WRITEONLY, sw::PUBLIC);

 float w = sRect.width() / dRect.width();
 float h = sRect.height() / dRect.height();

 const float xStart = sRect.x0 + 0.5f * w;
 float y = sRect.y0 + 0.5f * h;
 float x = xStart;

 for(int j = dRect.y0; j < dRect.y1; j++)
   x = xStart;

   for(int i = dRect.x0; i < dRect.x1; i++)
     // FIXME: Support RGBA mask
     dest->copyInternal(source, i, j, x, y, options.filter);

     x += w;

   y += h;


So - what highlights this code as problematic? We know prior to entering this function that all the bounds-checking has already been performed, and that any call to copyInternal with (i, j) in dRect and (x, y) in sRect will be safe.
The examples in the introduction above show cases where the resulting precision error means that a rounding-down occurs - in this case that wouldn’t be enough to produce an interesting security bug. Can we cause floating-point imprecision to result in a larger-than-correct value, leading to (x, y) values that are larger than expected?
If we look at the code, the intention of the developers is to compute the following:
 for(int j = dRect.y0; j < dRect.y1; j++)
   for(int i = dRect.x0; i < dRect.x1; i++)
   {      x = xStart + (i * w);      Y = yStart + (j * h);
     dest->copyInternal(source, i, j, x, y, options.filter);
If this approach had been used instead, we’d still have precision errors - but without the iterative calculation, there’d be no propagation of the error, and we could expect the eventual magnitude of the precision error to be stable, and in direct proportion to the size of the operands. With the iterative calculation as performed in the code, the errors start to propagate/snowball into a larger and larger error.
There are ways to estimate the maximum error in floating point calculations; and if you really, really need to avoid having extra bounds checks, using this kind of approach and making sure that you have conservative safety margins around those maximum errors might be a complicated and error-prone way to solve this issue. It’s not a great approach to identifying the pathological values that we want here to demonstrate a vulnerability; so instead we’ll take a brute-force approach.
Instinctively, we’re fairly sure that the multiplicative implementation will be roughly correct, and that the implementation with iterative addition will be much less correct. Given that the space of possible inputs is small (Chrome disallows textures with width or height greater than 8192), we can just run a brute force over all ratios of source width to destination width, comparing the two algorithms, and seeing where the results are most different. (Note that SwiftShader also limits us to even numbers). This leads us to the values of 5828, 8132; and if we compare the computations in this case (left side is the iterative addition, right side is the multiplication):
0:    1.075012 1.075012
1:    1.791687 1.791687
1000: 717.749878 717.749878   Up to here (at the precision shown) the values are still identical
1001: 718.466553 718.466553
2046: 1467.391724 1467.391724 At this point, the first significant errors start to occur, but note
2047: 1468.108398 1468.108521 that the "incorrect" result is smaller than the more precise one.
2856: 2047.898315 2047.898438
2857: 2048.614990 2048.614990 Here our two computations coincide again, briefly, and from here onwards
2858: 2049.331787 2049.331787 the precision errors consistently favour a larger result than the more
2859: 2050.048584 2050.048340 precise calculation.
8129: 5827.567871 5826.924805
8130: 5828.284668 5827.641602
8131: 5829.001465 5828.358398 The last index is now sufficiently different that int conversion results in an oob index.

(Note also that there will also be error in the “safe” calculation; it’s just that the lack of error propagation means that that error will remain directly proportional to the size of the input error, which we expect to be “small.”)
We can indeed see that, the multiplicative algorithm would remain within bounds; but that the iterative algorithm can return an index that is outside the bounds of the input texture!
As a result, we read an entire row of pixels past the end of our texture allocation - and this can be easily leaked back to javascript using WebGL. Stay tuned for an upcoming blog post in which we’ll use this vulnerability together with another unrelated issue in SwiftShader to take control of the GPU process from javascript.SkiaSkia is a graphics library used, among other places, in Chrome, Firefox and Android. In the web browsers it is used for example when drawing to a canvas HTML element using CanvasRenderingContext2D or when drawing SVG images. Skia is also used when drawing various other HTML elements, but canvas element and SVG images are more interesting from the security perspective because they enable more direct control over the objects being drawn by the graphic library.
The most complex type of object (and therefore, most interesting from the security perspective) that Skia can draw is a path. A path is an object that consists of elements such as lines, but also more complex curves, in particular quadratic or cubic splines.
Due to the way software drawing algorithms work in Skia, the precision issues are very much possible and quite impactful when they happen, typically leading to out-of-bounds writes.
To understand why these issues can happen, let’s assume you have an image in memory (represented as a buffer with size = width x height x color size). Normally, when drawing a pixel with coordinates (x, y) and color c, you would want to make sure that the pixel actually falls within the space of the image, specifically that 0 <= x < width and 0 <= y < height. Failing to check this could result in attempting to write the pixel outside the bounds of the allocated buffer. In computer graphics, making sure that only the objects in the image region are being drawn is called clipping.
So, where is the problem? Making a clip check for every pixel is expensive in terms of CPU cycles and Skia prides itself on speed. So, instead of making a clip check for every pixel, what Skia does is, it first makes the clip check on an entire object (e.g. line, path or any other type of object being drawn). Depending on the clip check, there are three possible outcomes:
  1. The object is completely outside of the drawing area: The drawing function doesn’t draw anything and returns immediately.

  1. The object is partially inside the drawing area: The drawing function proceeds with per-pixel clip enabled (usually by relying on SkRectClipBlitter).

  1. The entire object is in the drawing area: The drawing function draws directly into the buffer without performing per-pixel clip checks.

The problematic scenario is c) where the clip check is performed only per-object and the more precise, per-pixel checks are disabled. This means, if there is a precision issue somewhere between the per-object clip check and the drawing of pixels and if the precision issue causes the pixel coordinates to go outside of the drawing area, this could result in a security vulnerability.
We can see per-object clip checks leading to dropping per-pixel checks in several places, for example:
  • In hair_path (function for drawing a path without filling), clip is initially set to null (which disables clip checks). The clip is only set if the bounds of the path, rounded up and extended by 1 or 2 depending on the drawing options don’t fit in the drawing area. Extending the path bounds by 1 seems like a pretty large safety margin, but it is actually the least possible safe value because drawing objects with antialiasing on will sometimes result in drawing to nearby pixels.

  • In SkScan::FillPath (function for filling a path with antialiasing turned off), the bounds of the path are first extended by kConservativeRoundBias and rounded to obtain the “conservative” path bounds. A SkScanClipper object is then created for the current path. As we can see in the definition of SkScanClipper, it will only use SkRectClipBlitter if the x coordinates of the path bounds are outside the drawing area or if irPreClipped is true (which only happens when path coordinates are very large).

Similar patterns can be seen in other drawing functions.
Before we take a closer look at the issues, it is useful to quickly go over various number formats used by Skia:
  • SkScalar is a 32-bit floating point number

  • SkFDot6 is defined as an integer, but it is actually a fixed-point number with 26 bits to the left and 6 bits to the right of the decimal point. For example, SkFDot6 value of 0x00000001 represents the number 1/64.

  • SkFixed is also a fixed-point number, this time with 16 bits to the left and 16 bits to the right of the decimal point. For example, SkFixed value of 0x00000001 represents 1/(2**16)

Precision error with integer to float conversion
We discovered the initial problem when doing DOM fuzzing against Firefox last year. This issue where Skia wrote out-of-bounds caught our eye so we investigated further. It turned out the root cause was a discrepancy in the way Skia converted floating point to ints in several places. When making the per-path clip check, the lower coordinates (left and top of the bounding box) were rounded using this function:
static inline int round_down_to_int(SkScalar x) {    double xx = x;    xx -= 0.5;    return (int)ceil(xx);}
Looking at the code you see that it will return a number greater or equal to zero (which is necessary for passing the path-level clip check) for numbers that are strictly larger than -0.5. However, in another part of the code, specifically SkEdge::setLine if SK_RASTERIZE_EVEN_ROUNDING is defined (which is the case in Firefox), floats are rounded to integers differently, using the following function:
inline SkFDot6 SkScalarRoundToFDot6(SkScalar x, int shift = 0){    union {        double fDouble;        int32_t fBits[2];    } tmp;    int fractionalBits = 6 + shift;    double magic = (1LL << (52 - (fractionalBits))) * 1.5;
   tmp.fDouble = SkScalarToDouble(x) + magic;#ifdef SK_CPU_BENDIAN    return tmp.fBits[1];#else    return tmp.fBits[0];#endif}
Now let’s take a look at what these two functions return for a number -0.499. For this number, round_down_to_int returns 0 (which always passes the clipping check) and SkScalarRoundToFDot6 returns -32 which corresponds to -0.5, so we actually end up with a number that is smaller than the one we started with.
That’s not the only problem, though, because there’s another place where a precision error occurs in SkEdge::setLine.
Precision error when multiplying fractions
SkEdge::setLine calls SkFixedMul which is defined as:
static inline SkFixed(SkFixed a, SkFixed b) {    return (SkFixed)((int64_t)a * b >> 16);}
This function is for multiplying two SkFixed numbers. An issue comes up when using this function to multiply negative numbers. Let’s look at a small example. Let’s assume a = -1/(2**16) and b = 1/(2**16). If we multiply these two numbers on paper, the result is -1/(2**32). However, due to the way SkFixedMul works, specifically because the right shift is used to convert the result back to SkFixed format, the result we actually end up with is 0xFFFFFFFF which is SkFixed for  -1/(2**16). Thus, we end up with a result with a magnitude much larger than expected.
As the result of this multiplication is used by SkEdge::setLine to adjust the x coordinate of the initial line point here, we can use the issue in SkFixedMul to cause an additional error up to 1/64 of a pixel to go outside of the drawing area bounds.
By combining the previous two issues, it was possible to get the x coordinate of a line sufficiently small (smaller than -0.5), so that, when a fractional representation was rounded to an integer here, Skia attempted to draw at coordinates with x = -1, which is clearly outside the image bounds. This then led to an out-of-bounds write as can be seen in the original bug report. This bug could be exploited in Firefox by drawing an SVG image with coordinates as described in the previous section.
Floating point precision error when converting splines to line segments
When drawing paths, Skia is going to convert all non-linear curves (conic shapes, quadratic and cubic splines) to line segments. Perhaps unsurprisingly, these conversions suffer from precision errors.
The conversion of splines into line segments happen in several places, but the most susceptible to floating-point precision errors are hair_quad (used for drawing quadratic curves) and hair_cubic (used for drawing cubic curves). Both of these functions are called from hair_path, which we already mentioned above. Because (unsurprisingly), larger precision errors occur when dealing with cubic splines, we’ll only consider the cubic case here.
When approximating the spline, first the cubic coefficients are computed in SkCubicCoeff. The most interesting part is:
fA = P3 + three * (P1 - P2) - P0;fB = three * (P2 - times_2(P1) + P0);fC = three * (P1 - P0);fD = P0;
Where P1, P2 and P3 are input points and fA, fB, fC and fD are output coefficients. The line segment points are then computed in hair_cubic using the following code
const Sk2s dt(SK_Scalar1 / lines);Sk2s t(0);
Sk2s A = coeff.fA;Sk2s B = coeff.fB;Sk2s C = coeff.fC;Sk2s D = coeff.fD;for (int i = 1; i < lines; ++i) {    t = t + dt;    Sk2s p = ((A * t + B) * t + C) * t + D;[i]);}
Where p is the output point and lines is the number of line segments we are using to approximate the curve. Depending on the length of the spline, a cubic spline can be approximated with up to 512 lines.
It is obvious that the arithmetic here is not going to be precise. As identical computations happen for x and y coordinates, let’s just consider the x coordinate in the rest of the post.
Let’s assume the width of the drawing area is 1000 pixels. Because hair_path is used for drawing path with antialiasing turned on, it needs to make sure that all points of the path are between 1 and 999, which is done in the initial, path-level clip check. Let’s consider the following coordinates that all pass this check:
p0 = 1.501923p1 = 998.468811p2 = 998.998779p3 = 999.000000
For these points, the coefficients are as follows
a = 995.908203b = -2989.310547c = 2990.900879d = 1.501923
If you do the same computation in larger precision, you’re going to notice that the numbers here aren’t quite correct. Now let’s see what happens if we approximate the spline with 512 line segments. This results in 513 x coordinates:
0: 1.5019231: 7.3321302: 13.1395743: 18.9243014: 24.6863565: 30.425781...500: 998.986389501: 998.989563502: 998.992126503: 998.994141504: 998.995972505: 998.997314506: 998.998291507: 998.999084508: 998.999695509: 998.999878510: 999.000000511: 999.000244512: 999.000000
We can see that the x coordinate keeps growing and at point 511 clearly goes outside of the “safe” area and grows larger than 999.
As it happens, this isn’t sufficient to trigger an out-of-bounds write, because, due to how drawing antialiased lines works in Skia, we need to go at least 1/64 of a pixel outside of the clip area for it to become a security issue. However, an interesting thing about the precision errors in this case is that the larger the drawing area, the larger the error that can happen.
So let’s instead consider a drawing area of 32767 pixels (maximum canvas size in Chrome). The initial clipping check then checks that all path points are in the interval [1, 32766]. Now let’s consider the following points:
p0 = 1.7490234375p1 = 32765.9902343750p2 = 32766.000000p3 = 32766.000000
The corresponding coefficients
a = 32764.222656b = -98292.687500c = 98292.726562d = 1.749023
And the corresponding line approximation
0: 1.749023431: 193.3522952: 384.2071233: 574.3149414: 763.6772465: 952.295532…505: 32765.925781506: 32765.957031507: 32765.976562508: 32765.992188509: 32766.003906510: 32766.003906511: 32766.015625512: 32766.000000
You can see that we went out-of-bounds significantly more at index 511.
Fortunately for Skia and unfortunately for aspiring attackers, this bug can’t be used to trigger memory corruption, at least not in the up-to-date version of skia. The reason is SkDrawTiler. Whenever Skia draws using SkBitmapDevice (as opposed to using a GPU device) and the drawing area is larger than 8191 pixels in any dimension, instead of drawing the whole image at once, Skia is going to split it into tiles of size (at most) 8191x8191 pixels. This change was made in March, not for security reasons, but to be able to support larger drawing surfaces. However, it still effectively prevented us from exploiting this issue and will also prevent exploiting other cases where a surface larger than 8191 is required to reach the precision error of a sufficient magnitude.
Still, this bug was exploitable before March and we think it nicely demonstrates the concept of precision errors.
Integer precision error when converting splines to line segments
There is another place where splines are approximated as line segments when drawing (in this case: filling) paths that was also affected by a precision error, in this case an exploitable one. Interestingly, here the precision error wasn’t in floating-point but rather in fixed-point arithmetic.
The error happens in SkQuadraticEdge::setQuadraticWithoutUpdate and SkCubicEdge::setCubicWithoutUpdate. For simplicity, we are again going to concentrate just on the cubic spline version and, again, only on the x coordinate.
In SkCubicEdge::setCubicWithoutUpdate, the curve coordinates are first converted to SkFDot6 type (integer with 6 bits used for fraction). After that, parameters corresponding to the first, second and third derivative of the curve at the initial point are going to be computed:
SkFixed B = SkFDot6UpShift(3 * (x1 - x0), upShift);SkFixed C = SkFDot6UpShift(3 * (x0 - x1 - x1 + x2), upShift);SkFixed D = SkFDot6UpShift(x3 + 3 * (x1 - x2) - x0, upShift);
fCx     = SkFDot6ToFixed(x0);fCDx    = B + (C >> shift) + (D >> 2*shift);    // biased by shiftfCDDx   = 2*C + (3*D >> (shift - 1));           // biased by 2*shiftfCDDDx  = 3*D >> (shift - 1);                   // biased by 2*shift
Where x0, x1, x2 and x3 are x coordinates of the 4 points that define the cubic spline and shift and upShift depend on the length of the curve (this corresponds to the number of linear segments the curve is going to be approximated in). For simplicity, we can assume shift = upShift = 6 (maximum possible values).
Now let’s see what happens for some very simple input values:
x0 = -30x1 = -31x2 = -31x3 = -31
Note that x0, x1, x2 and x3 are of the type SkFDot6 so value -30 corresponds to -0.46875 and -31 to -0.484375. These are close to -0.5 but not quite and are thus perfectly safe when rounded. Now let’s examine the values of the computed parameters:
B = -192C = 192D = -64
fCx = -30720fCDx = -190fCDDx = 378fCDDDx = -6
Do you see where the issue is? Hint: it’s in the formula for fCDx.
When computing fCDx (first derivation of a curve), the value of D needs is right-shifted by 12. However, D is too small to do that precisely, and since D is negative, the right shift
D >> 2*shift
Is going to result in -1, which is larger in magnitude than the intended result. (Since D is of type SkFixed its actual value is -0.0009765625 and the shift, when interpreted as division by 4096, would result in -2.384185e-07). Because of this, the whole fCDx ends up as a larger negative value than it should (-190 vs. -189.015).
Afterwards, the value of fCDx gets used when calculating the x value of line segments. This happens in SkCubicEdge::updateCubic on this line:
newx    = oldx + (fCDx >> dshift);
The x values, when approximating the spline with 64 line segments (maximum for this algorithm), are going to be (expressed as index, integer SkFixed value and the corresponding floating point value):
index raw      interpretation0:    -30720   -0.468751:    -30768   -0.4694822:    -30815   -0.4702003:    -30860   -0.4708864:    -30904   -0.4715585:    -30947   -0.472214...31:   -31683   -0.48344432:   -31700   -0.48370433:   -31716   -0.48394834:   -31732   -0.48419235:   -31747   -0.48442136:   -31762   -0.48465037:   -31776   -0.48486338:   -31790   -0.485077...60:   -32005   -0.48835861:   -32013   -0.48848062:   -32021   -0.48860263:   -32029   -0.48872464:   -32037   -0.488846
You can see that for the 35th point, the x value (-0.484421) ends up being smaller than the smallest input point (-0.484375) and the trend continues for the later points. This value would still get rounded to 0 though, but there is another problem.
The x values computed in SkCubicEdge::updateCubic are passed to SkEdge::updateLine, where they are converted from SkFixed type to SkFDot6 on the following lines:
x0 >>= 10;x1 >>= 10;
Another right shift! And when, for example, SkFixed value -31747 gets shifted we end up with SkFDot6 value of -32 which represents -0.5.
At this point we can use the same trick described above in the “Precision error when multiplying fractions” section to go smaller than -0.5 and break out of the image bounds. In other words, we can make Skia draw to x = -1 when drawing a path.
But, what can we do with it?
In general, given that Skia allocates image pixels as a single allocation that is organized row by row (as most other software would allocate bitmaps), there are several cases of what can happen with precision issues. If we assume an width x height image and that we are only able to go one pixel out of bounds:
  1. Drawing to y = -1 or y = height immediately leads to heap out-of-bounds write
  2. Drawing to x = -1 with y = 0 immediately leads to a heap underflow of 1 pixel
  3. Drawing to x = width with y = height - 1 immediately leads to heap overflow of 1 pixel
  4. Drawing to x = -1 with y > 0 leads to a pixel “spilling” to the previous image row
  5. Drawing to x = height with y < height-1 leads to a pixel “spilling” to the next image row

What we have here is scenario d) - unfortunately we can’t draw to x = 1 with y = 0 because the precision error needs to accumulate over the growing values of y.
Let’s take a look at the following example SVG image:
<svg width="100" height="100" xmlns=""><style>body { margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px}</style><path d="M -0.46875 -0.484375 C -0.484375 -0.484375, -0.484375 -0.484375, -0.484375 100 L 1 100 L 1 -0.484375" fill="red" shape-rendering="crispEdges" /></svg>
If we render this in an unpatched version of Firefox what we see is shown in the following image. Notice how the SVG only contains coordinates on the left side of the screen, but some of the red pixels get drawn on the right. This is because, due to the way images are allocated, drawing to x = -1 and y = row is equal to drawing to x = width - 1 and y = row - 1.

Opening an SVG image that triggers a Skia precision issue in Firefox. If you look closely you’ll notice some red pixels on the right side of the image. How did those get there? :)
Note that we used Mozilla Firefox and not Google Chrome because, due to SVG drawing internals (specifically: Skia seems to draw the entire image at once, while Chrome uses additional tiling) it is easier to demonstrate the issue in Firefox. However, both Chrome and Firefox were equally affected by this issue.
But, other than drawing a funny image, is there real security impact to this issue? Here, SkARGB32_Shader_Blitter comes to the rescue (SkARGB32_Shader_Blitter is used whenever shader effects are applied to a color in Skia). What is specific about SkARGB32_Shader_Blitter is that it allocates a temporary buffer of the same size as a single image row. When SkARGB32_Shader_Blitter::blitH is used to draw an entire image row, if we can make it draw from x = -1 to x = width - 1 (alternately from x = 0 to x = width), it will need to write width + 1 pixels into a buffer that can only hold width pixels, leading to a buffer overflow as can be seen in the ASan log in the bug report.
Note how the PoCs for Chrome and Firefox contain SVG images with a linearGradient element - the linear gradient is used specifically to select SkARGB32_Shader_Blitter instead of drawing pixels to the image directly, which would only result in pixels spilling to the previous row.
Another specific of this issue is that it can only be reached when drawing (more specifically: filling) paths with antialiasing turned off. As it is not currently possible to draw paths to a HTML canvas elements with antialiasing off (there is an imageSmoothingEnabled property but it only applies to drawing images, not paths), an SVG image with shape-rendering="crispEdges" must be used to trigger the issue.
All precision issues we reported in Skia were fixed by increasing kConservativeRoundBias. While the current bias value is large enough to cover the maximum precision errors we know about, we should not dismiss the possibility of other places where precision issues can occur.ConclusionWhile precision issues, such as described in this blog post, won’t be present in most software products, where they are present they can have quite serious consequences. To prevent them from occurring:
  • Don’t use floating-point arithmetic in cases where the result is security-sensitive. If you absolutely have to, then you need to make sure that the maximum possible precision error cannot be larger than some safety margin. Potentially, interval arithmetic could be used to determine the maximum precision error in some cases. Alternately, perform security checks on the result rather than input.

  • With integer arithmetic, be wary of any operations that can reduce the precision of the result, such as divisions and right shifts.

When it comes to finding such issues, unfortunately, there doesn’t seem to be a great way to do it. When we started looking at Skia, initially we wanted to try using symbolic execution on the drawing algorithms to find input values that would lead to drawing out-of-bounds, as, on the surface, it seemed this is a problem symbolic execution would be well suited for. However, in practice, there were too many issues: most tools don’t support floating point symbolic variables and, even when running against just the integer parts of the simplest line drawing algorithm, we were unsuccessful in completing the run in a reasonable time (we were using KLEE with STP and Z3 backends).
In the end, what we ended up doing was a combination of the more old-school methods: manual source review, fuzzing (especially with values close to image boundaries) and, in some cases, when we already identified potentially problematic areas of code, even bruteforcing the range of all possible values.
Do you know of other instances where precision errors resulted in security issues? Let us know about them in the comments.
Categories: Security

Detecting Kernel Memory Disclosure – Whitepaper

Google Project Zero - Thu, 06/21/2018 - 12:28
Posted by Mateusz Jurczyk, Project Zero
Since early 2017, we have been working on Bochspwn Reloaded – a piece of dynamic binary instrumentation built on top of the Bochs IA-32 software emulator, designed to identify memory disclosure vulnerabilities in operating system kernels. Over the course of the project, we successfully used it to discover and report over 70 previously unknown security issues in Windows, and more than 10 bugs in Linux. We discussed the general design of the tool at REcon Montreal and Black Hat USA in June and July last year, and followed up with the description of the latest implemented features and their results at INFILTRATE in April 2018 (click on the links for slides).
As we learned during this study, the problem of leaking uninitialized kernel memory to user space is not caused merely by simple programming errors. Instead, it is deeply rooted in the nature of the C programming language, and has been around since the very early days of privilege separation in operating systems. In an attempt to systematically outline the background of the bug class and the current state of the art, we wrote a comprehensive paper on this subject. It aims to provide an exhaustive guide to kernel infoleaks, their genesis, related prior work, means of detection and future avenues of research. While a significant portion of the document is dedicated to Bochspwn Reloaded, it also covers other methods of infoleak detection, non-memory data sinks and alternative applications of full-system instrumentation, including the evaluation of some of the ideas based on the developed prototypes and experiments performed as part of this work.
Without further ado, enjoy the read:
Detecting Kernel Memory Disclosure with x86 Emulation and Taint Tracking (PDF, 1.54 MB)
Categories: Security
Subscribe to aggregator - Security