CPU Cache的优化:解决伪共享问题

伪共享问题(false sharing)

对于解释伪共享问题,就需要了解一下缓存行的相关概念。缓存行是主存复制到高速缓存的最小单位,一般情况下缓存行的大小为32~128字节(通常为64字节)。

在多线程程序执行的过程中,有可能将2个或多个需要频繁修改的变量存储在同一个缓存行当中。这样以来,会频繁的造成缓存头失效的问题。如下图所示:

从编码的角度,为了解决上面的问题,可以使用额外字段来填充缓存行数据。从而达到不同变量之间占用不用的缓存行,增加缓存的命中率。优化后如下图所示:

简单事例

下面看一个例子,看一看填充缓存行前后的运行效果对比。

public class FalseSharing implements Runnable {
	
//	public static final int NUM_THREAD = Runtime.getRuntime().availableProcessors(); // CUP核数
	public static final int NUM_THREAD = 2; // CPU核数
	public static final long INTERATIONS = 500L * 1000L * 1000L;
	private final int arrayIndex;

	private static VolatileLong[] longs = new VolatileLong[NUM_THREAD];

	static {
		for (int i = 0; i < longs.length; i++) {
			longs[i] = new VolatileLong();
		}
	}

	public FalseSharing(final int arrayIndex) {
		this.arrayIndex = arrayIndex;
	}

	public static void main(String[] args) throws InterruptedException {
		System.out.println(Runtime.getRuntime().availableProcessors());
		final long start = System.currentTimeMillis();
		runTest();
		System.out.println("duration = " + (System.currentTimeMillis() - start));
	}

	private static void runTest() throws InterruptedException {
		Thread[] threads = new Thread[NUM_THREAD];
		for (int i = 0; i < threads.length; i++) {
			threads[i] = new Thread(new FalseSharing(i));
		}
		for (Thread thread : threads) {
			thread.start();
		}
		for (Thread thread : threads) {
			thread.join();
		}
	}

	@Override
	public void run() {
		long i = INTERATIONS + 1;
		while (0 != --i) {
			longs[arrayIndex].value = i;
		}
	}

	public static final class VolatileLong {
		public volatile long value = 0L;
		public long p1, p2, p3, p4, p5, p6, p7; // 用来填充缓存行
	}

}

笔者在自己的PC上面测试,加填充代码和不加填充代码的前后效果如下:

# 有填充代码
duration = 5849
# 无填充代码
duration = 19024

从结果看来,有填充代码的比无填充代码的效率提高了3倍多。下面是笔者测试的,线程数与耗时的测试结果:

Java8中的支持

Java8会对CPU 伪共享问题做一些优化,这里面使用到了 @sun.misc.Contended 。它可以使用在class 和 field 上面,官方也不推荐使用。我们在写业务代码时,使用了没有什么效果。其内容如下:

@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD, ElementType.TYPE})
public @interface Contended {
    String value() default "";
}

ConcurrentHashMap

在Java8中的ConcurrentHashMap 中我们可以看到如下:

/**
 * A padded cell for distributing counts.  Adapted from LongAdder
 * and Striped64.  See their internal docs for explanation.
 */
@sun.misc.Contended static final class CounterCell {
    volatile long value;
    CounterCell(long x) { value = x; }
}

上面CounterCell 是用于分段锁的大小统计,由此可以看出JDK 底层对@Contended 做了CPU Cache 优化。

Thread

// The following three initially uninitialized fields are exclusively
// managed by class java.util.concurrent.ThreadLocalRandom. These
// fields are used to build the high-performance PRNGs in the
// concurrent code, and we can not risk accidental false sharing.
// Hence, the fields are isolated with @Contended.

/** The current seed for a ThreadLocalRandom */
@sun.misc.Contended("tlr")
long threadLocalRandomSeed;

/** Probe hash value; nonzero if threadLocalRandomSeed initialized */
@sun.misc.Contended("tlr")
int threadLocalRandomProbe;

/** Secondary seed isolated from public ThreadLocalRandom sequence */
@sun.misc.Contended("tlr")
int threadLocalRandomSecondarySeed;