Friday, July 3, 2009

Fake NUMA nodes in Linux

While NUMA systems are becoming commonplace, many a times we do not have access to such systems when either writng new code, understanding NUMA architecture, conducting experiments or debugging existing code. For such cases, the Linux kernel provides a very neat feature called 'fake numa nodes'. One can create fake numa nodes on a non-NUMA machine by simply passing a commandline parameter to the kernel. Below are the steps for x86 systems:

  1. Following config options need to be turned on: CONFIG_NUMA=y, CONFIG_NUMA_EMULATION=y
  2. Build the kernel with the above config options set
  3. The kernel commandline could be any one of the following, depending on your requirement:
  • numa=fake=4 : Split the entire memory into 4 equal nodes
  • numa=fake=8*1024 : Split the memory into 8 equal chunks of 1024MB (ie 1G) (note, the number is considered to be in MB) [If system has more memory, the last node will be assigned remaining memory]
  • numa=fake=2*512,2*1024 : Split the memory into 2 nodes of 512MB each and 2 more nodes of 1GB each (and so on)
Note: On ppc, the nodes required are specified using cumulative comma separated list. For example, to create 4 nodes of 2GB each the parameter would be: "numa=fake=2G,4G,6G,8G"

You can play around with more options :) The userspace numa utilities like numactl and numastat would then show the numa environment that has been setup. Details of the cpumap and per-node meminfo can be obtained from the sysfs file /sys/devices/system/node/node<0|1|2..>.

Fake NUMA has one flaw however and that is the CPU mapping to nodes. There would exist nodes that do not show up as having any CPUs (unde the cpumap file in the node dir of the above mentioned sysfs file). As per the semantics, a CPU must unquely belong to a NUMA node. However, inside the kernel, the CPU is mapped to all the fake nodes.

Fake NUMA nodes can be created even on a real NUMA system. In this case, the fake nodes are aligned within a real node. The distances between two fake nodes across two real nodes is maintained. Could cover internal implementation details in a separate post. Have fun playing around with NUMA !

No comments: