AES Side-Channel Attacks: Cache Timing and Power Analysis Vulnerabilities

Introduction

While AES is mathematically secure, real-world implementations leak information through physical side channels: timing, power consumption, and electromagnetic radiation. Understanding these attacks is essential for secure implementation.

AES Algorithm Review

Structure

AES-128 processes 128-bit blocks using 10 rounds:

def aes_encrypt(plaintext, key):
    """
    AES-128 encryption
    """
    state = plaintext
    round_keys = key_expansion(key)  # Generate 11 round keys

    # Initial round
    state = add_round_key(state, round_keys[0])

    # Main rounds (9 rounds)
    for round_num in range(1, 10):
        state = sub_bytes(state)        # S-box substitution
        state = shift_rows(state)       # Row permutation
        state = mix_columns(state)      # Column mixing
        state = add_round_key(state, round_keys[round_num])

    # Final round (no MixColumns)
    state = sub_bytes(state)
    state = shift_rows(state)
    state = add_round_key(state, round_keys[10])

    return state

S-Box Lookup

The S-box is the only non-linear component:

# Rijndael S-box (256 bytes)
SBOX = [
    0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
    # ... (256 total values)
]

def sub_bytes(state):
    """Apply S-box to each byte"""
    return bytes([SBOX[b] for b in state])

Vulnerability: Table lookups leak information through cache timing!

Cache-Timing Attacks

CPU Cache Basics

Modern CPUs use multi-level caches:

L1: 32 KB, ~4 cycles
L2: 256 KB, ~12 cycles
L3: 8 MB, ~40 cycles
RAM: GB, ~200 cycles

Cache line: 64 bytes loaded together

Vulnerability in Table-Based AES

Standard implementation uses pre-computed tables:

# T-tables for fast AES (per-round lookup tables)
T0 = [
    0xc66363a5, 0xf87c7c84, 0xee777799, 0xf67b7b8d,
    # ... 256 entries of 32-bit values
]

def aes_round_fast(state, round_key):
    """
    Fast AES using T-tables
    VULNERABLE to cache timing!
    """
    result = [0] * 16

    for i in range(4):
        result[4*i:4*i+4] = (
            T0[state[4*i]] ^
            T1[state[4*i+1]] ^
            T2[state[4*i+2]] ^
            T3[state[4*i+3]] ^
            round_key[4*i:4*i+4]
        )

    return bytes(result)

Attack Mechanism

Observation: Table access time depends on cache state.

def measure_access_time(table, index):
    """Measure time to access table[index]"""
    start = time.perf_counter_ns()
    value = table[index]  # Access
    end = time.perf_counter_ns()
    return end - start

# Cache hit: ~4 ns
# Cache miss: ~200 ns (50x slower!)

Attack: Observe encryption timing for different plaintexts.

Bernstein’s Attack (2005)

First practical remote cache-timing attack on AES:

def cache_timing_attack(encryption_oracle):
    """
    Recover AES key by timing S-box accesses
    Works even over network with ~15ms precision!
    """
    key_candidate_scores = [[0] * 256 for _ in range(16)]

    for trial in range(10000):
        # Choose plaintext to isolate key byte
        plaintext = os.urandom(16)

        # Measure encryption time
        start = time.perf_counter_ns()
        ciphertext = encryption_oracle(plaintext)
        elapsed = time.perf_counter_ns() - start

        # Correlation: timing variations depend on S-box access pattern
        # which depends on plaintext XOR key
        for byte_pos in range(16):
            for key_guess in range(256):
                # Predict S-box index
                sbox_index = plaintext[byte_pos] ^ key_guess

                # Cache line accessed (64-byte aligned)
                cache_line = sbox_index // (64 // 4)  # 16 S-box entries per line

                # Score based on timing
                if elapsed > THRESHOLD:  # Cache miss likely
                    key_candidate_scores[byte_pos][key_guess] += 1

    # Most likely key bytes
    recovered_key = bytes([
        max(range(256), key=lambda k: scores[k])
        for scores in key_candidate_scores
    ])

    return recovered_key

Success rate: ~90% with 2^20 samples Requirements: Same CPU, shared cache

Prime+Probe Attack

More powerful technique:

def prime_cache():
    """Fill cache with attacker's data"""
    dummy_array = bytearray(32 * 1024)  # Fill L1 cache
    for i in range(len(dummy_array)):
        dummy_array[i] = i & 0xFF

def probe_cache():
    """Measure which cache lines were evicted"""
    timing_results = []

    for line in range(CACHE_LINES):
        start = time.perf_counter_ns()
        value = dummy_array[line * CACHE_LINE_SIZE]
        elapsed = time.perf_counter_ns() - start

        timing_results.append(elapsed)

    return timing_results

def prime_probe_attack():
    """
    1. Prime: Fill cache with known data
    2. Wait: Victim encrypts (evicts some cache lines)
    3. Probe: Measure which lines were evicted
    4. Deduce: Which S-box entries were accessed
    """
    prime_cache()

    # Victim encryption happens here
    trigger_encryption()

    # Measure evictions
    timings = probe_cache()

    # Analyze pattern
    accessed_lines = [i for i, t in enumerate(timings)
                      if t > MISS_THRESHOLD]

    # accessed_lines reveals S-box indices → key information
    return analyze_pattern(accessed_lines)

Differential Power Analysis (DPA)

Power Consumption Basics

CMOS circuits consume power proportional to bit transitions:

P ∝ Hamming Weight of data being processed

Hamming Weight: Number of 1-bits in binary representation.

def hamming_weight(byte_value):
    """Count number of 1-bits"""
    return bin(byte_value).count('1')

# Examples:
# 0x00 (00000000) → HW = 0
# 0xFF (11111111) → HW = 8
# 0x55 (01010101) → HW = 4

Attack Setup

Equipment:

Digital oscilloscope (1 GHz+)
Current probe on VDD pin
~1000 power traces per key byte

def capture_power_trace(plaintext, device):
    """
    Capture power consumption during AES encryption
    Returns: Array of power samples (typically 1M samples)
    """
    # Trigger encryption
    device.set_plaintext(plaintext)
    device.start_encryption()

    # Sample power at high frequency (e.g., 1 GHz)
    power_trace = oscilloscope.capture(samples=1_000_000)

    return power_trace

Correlation Power Analysis (CPA)

Most effective DPA variant:

def correlation_power_analysis(power_traces, plaintexts):
    """
    Recover AES key using correlation between power and Hamming weight
    """
    num_traces = len(power_traces)
    num_samples = len(power_traces[0])
    key_candidates = range(256)

    recovered_key = []

    for byte_position in range(16):
        max_correlation = -1
        best_key_guess = 0

        for key_guess in key_candidates:
            # Hypothetical power consumption
            hypothetical = []

            for plaintext in plaintexts:
                # Predict S-box output
                sbox_input = plaintext[byte_position] ^ key_guess
                sbox_output = SBOX[sbox_input]

                # Predict power (Hamming weight model)
                power_prediction = hamming_weight(sbox_output)
                hypothetical.append(power_prediction)

            # Correlation between prediction and actual power
            correlations = []
            for sample_point in range(num_samples):
                actual_power = [trace[sample_point] for trace in power_traces]
                corr = pearson_correlation(hypothetical, actual_power)
                correlations.append(abs(corr))

            max_corr = max(correlations)

            if max_corr > max_correlation:
                max_correlation = max_corr
                best_key_guess = key_guess

        recovered_key.append(best_key_guess)
        print(f"Byte {byte_position}: 0x{best_key_guess:02x} (corr={max_correlation:.4f})")

    return bytes(recovered_key)

def pearson_correlation(X, Y):
    """Pearson correlation coefficient"""
    import numpy as np
    return np.corrcoef(X, Y)[0, 1]

Success rate: >99% with 1000 traces Requirements: Physical access, ~$5000 equipment

Template Attacks

Most powerful (offline profiling):

def template_attack():
    """
    Phase 1: Profiling (attacker owns identical device)
    - For each key byte k, collect power traces
    - Build statistical model (mean, covariance)

    Phase 2: Attack (on target device)
    - Collect single trace
    - Maximum likelihood estimation
    """
    # Profiling phase
    templates = {}
    for key_value in range(256):
        traces = capture_traces_with_key(key_value, num_traces=1000)
        templates[key_value] = {
            'mean': np.mean(traces, axis=0),
            'cov': np.cov(traces.T)
        }

    # Attack phase
    target_trace = capture_single_trace(target_device)

    # Find most likely key
    likelihoods = []
    for key_value in range(256):
        likelihood = multivariate_gaussian_pdf(
            target_trace,
            templates[key_value]['mean'],
            templates[key_value]['cov']
        )
        likelihoods.append(likelihood)

    recovered_key = max(range(256), key=lambda k: likelihoods[k])
    return recovered_key

Success rate: Single trace often sufficient Requirements: Profiling device, advanced statistics

Constant-Time Implementations

Bitslicing

Process 64 encryptions in parallel using bitwise operations:

def aes_bitslice_sbox(input_bits):
    """
    Constant-time S-box using boolean circuit
    No table lookups → No cache timing leaks
    """
    # S-box as 8-input, 8-output boolean circuit
    # ~200 AND/OR/XOR gates total

    # Example for 1 output bit (simplified):
    U = [input_bits[i] for i in range(8)]

    # Multiplicative inverse in GF(2^8)
    # Implemented as boolean circuit
    inv = galois_inverse(U)  # ~150 gates

    # Affine transformation
    output = affine_transform(inv)  # ~50 gates

    return output  # All operations constant-time

Performance: ~50% slower than table-based Security: Immune to cache timing

AES-NI Instructions

Modern CPUs provide constant-time AES hardware:

// Intel AES-NI intrinsics
#include <wmmintrin.h>

__m128i aes_encrypt_ni(__m128i plaintext, __m128i* round_keys) {
    __m128i state = plaintext;

    state = _mm_xor_si128(state, round_keys[0]);

    for (int i = 1; i < 10; i++) {
        state = _mm_aesenc_si128(state, round_keys[i]);
    }

    state = _mm_aesenclast_si128(state, round_keys[10]);

    return state;
}

Performance: 7 cycles/byte (10x faster than software) Security: Hardened against side channels

Power Analysis Mitigations

Masking: Randomize intermediate values

def masked_sbox(input_byte, mask):
    """
    Boolean masking to defeat DPA
    Invariant: input_byte = masked_input ^ mask
    """
    # Mask S-box input
    masked_input = input_byte ^ mask

    # Masked S-box lookup (pre-computed)
    masked_output = MASKED_SBOX[mask][masked_input]

    # Output is also masked
    # Caller must remove mask later
    return masked_output

Shuffling: Randomize operation order

def shuffled_aes(state, round_key):
    """Randomize byte processing order each round"""
    indices = list(range(16))
    random.shuffle(indices)

    result = bytearray(16)
    for i in indices:
        result[i] = SBOX[state[i] ^ round_key[i]]

    return bytes(result)

Real-World Attacks

OpenSSL CVE-2006-2940

Vulnerable code (pre-2006):

// VULNERABLE: Table-based AES
static const u32 Te0[256] = { /* ... */ };

void AES_encrypt(const u8 *in, u8 *out, const AES_KEY *key) {
    // Table lookups leak via cache timing
    s0 = Te0[in[0] ^ key->rd_key[0]];  // Cache-dependent timing!
    // ...
}

Impact: Remote key recovery from HTTPS servers Fix: Constant-time implementation or AES-NI

Smartphone Secure Elements

Many chips vulnerable to power analysis:

2018: Infineon TPM vulnerability (CVE-2017-15361)
2021: Various secure elements extractable via DPA

Requirements: Physical proximity, specialized equipment

Detection and Prevention

Timing Audit

def audit_timing_leaks(function, inputs):
    """
    Statistical test for timing variations
    """
    import scipy.stats

    timings_by_input = {}

    for inp in inputs:
        start = time.perf_counter_ns()
        function(inp)
        elapsed = time.perf_counter_ns() - start

        timings_by_input.setdefault(inp, []).append(elapsed)

    # ANOVA test: Do inputs affect timing?
    groups = list(timings_by_input.values())
    f_stat, p_value = scipy.stats.f_oneway(*groups)

    if p_value < 0.05:
        print("WARNING: Timing leak detected!")
        return False
    else:
        print("OK: No significant timing variation")
        return True

Best Practices

Use AES-NI when available
Bitslicing for software implementations
Avoid table lookups on secret-dependent indices
Test for constant time using statistical methods
Physical security for high-value targets

Conclusion

Side-channel attacks demonstrate that cryptographic security requires more than mathematical strength. Implementation matters critically.

Modern systems must use:

Hardware AES (AES-NI) for constant-time operation
Bitsliced implementations when hardware unavailable
Power analysis mitigations (masking, shuffling) for embedded devices
Regular auditing for timing leaks

As attackers refine side-channel techniques, defensive implementation becomes increasingly critical for real-world security.

References

Bernstein, D.J. (2005). “Cache-timing attacks on AES”
Kocher, P., et al. (1999). “Differential Power Analysis”
Chari, S., et al. (1999). “Towards Sound Approaches to Counteract Power-Analysis Attacks”
Osvik, D.A., et al. (2006). “Cache Attacks and Countermeasures: The Case of AES”