Setting up an xcpu (really, xcpu2) cluster This writeup fron Ron, and it is being done in real time as I set up a small cluster in my house. This section describes setting up an xcpu cluster, with the xcpu2 variant of xcpu. We're using xcpu2 because it of its support for user namespaces. All that really means is that when you run a program, from your working directory, all the files are visible to your programs on the cluster node -- without you, or the sysadmin, or anyone, having to set up mount points. Xcpu2 is so much more convenient that I've moved to it exclusively. This is a an ultralight version. All you need is a kernel and initrd. In practice we've built initrd's as small as 1.5 Mbytes. We can make them smaller, we believe. Lguest guests boot in < 2 seconds with this setup. You no longer need to download more than one thing -- i.e., you download onesis, and, when you follow these instructions, xcpu2 is downloaded and built automatically. So here we go!

You need to provide the kernel. We may in future provide a sample kernel, but it's hard to get a universal kernel nowadays in Linux. So these instructions assume you have a kernel which you will use.

Download and install oneSIS

First, you are going to need a standard OneSIS distro. Because this is systems software and you need to generate an initrd with device files in it, you might as well do this step as root.

sudo su

svn co https://onesis.svn.sourceforge.net/svnroot/onesis/trunk

mv trunk onesis (what's the automagic command to do get a useful name?)

cd onesis

make xcpu-tarball

You don't need this but it avoids picky error messages

make ramfs-tarball

make install

This gets basic OneSIS functionality. You are now ready to build an xcpu2 cluster node. We are using 2.6.25.1 for our kernels.

Note that we MUST have 9p and 9pnet kernel modules to use xcpu2.

at this point you have a kernel and initrd and should be able to PXE-boot them. You'll need to set up an actual initrd. Pick a kernel version:

export u=`uname -r`
./mk-initramfs-oneSIS -f initrd-$u.img $u -nn -rr \
-o ../overlays/xcpu-64 \
-w e1000 \
-w forcedeth \
-w ext3

What's going on here? This will create an initrd with e1000 and forcedeth drivers, and the ext3 file system.

Setting up PXE boot

You'll need to put this initrd AND your kernel somewhere handy. Here's how we do it at sandia with PXE.

Here is a listing of how one of our systems looks, starting at /tftpboot.

In this example, our xcpu kernel is vmlinuz-2.6.23-rc8, our initrd is initrd-2.6.23-rc8.img, and they are located in /tftpboot/x86_64.

Your kernel will almost certainly be different. When you see 2.6.23-rc8, just think "my kernel version".

We need to further config pxe. That configuration is in /tftpboot/pxelinux.cfg. The file that matters (for us) is default, which is:
/tftpboot/pxelinux.cfg/default -> x86_64/2.6.23-rc8.
Default is a symlink to a file that looks like this:
[rminnich@prism-admin oneSIS]$ cat /tftpboot/pxelinux.cfg/x86_64/2.6.23-rc8
default initrd
label initrd
kernel /x86_64/vmlinuz-2.6.23-rc8
append initrd=/x86_64/initrd-2.6.23-rc8.img console=tty0 console=ttyS1,57600 earlyprintk=ttyS1,57600

At this point, you should reboot the node(s)

That's it. The nodes pxeboot the kernel with initrd. Once the node is up, you can just xrx to it directly (as root):

Once the nodes are rebooted, you can test

The command to run remote commands on the node is xrx. You run it this way:
xrx node date

So, if you have a node named n8, you can:
xrx n8 date

Suppose you have nodes named n8, n9, n10, and n11. You can:
xrx 'n[8-11]' date

Supporting users other than root

xcpufs does not use /etc/passwd and /etc/group at all. To have xcpufs know about user names and groups, you have to tell it. The commands to do this are xgroupset and xuserset.

To add more groups, since we only have the root group, you need to run the xgroupset command, in this case the name somegroup with gid 567:
xgroupset node somegroup 567

To add more users, e.g. add rminnich with uid 525 and group somegroup, you need to add their information and transfer the public key to xcpufs:
xuserset node rminnich 525 somegroup ~rminnich/.ssh/id_ras.pub

Note that we added 'somegroup' for rminnich in that earlier xgroupset command.

Layout of typical tftpboot trees.


/tftpboot
/tftpboot/pxelinux.cfg.catalyst64
/tftpboot/pxelinux.cfg
/tftpboot/pxelinux.cfg/default.rhel5
/tftpboot/pxelinux.cfg/default.old
/tftpboot/pxelinux.cfg/x86_64
/tftpboot/pxelinux.cfg/x86_64/2.6.18mlw5-no-initrd
/tftpboot/pxelinux.cfg/x86_64/2.6.18-mlw5.1
/tftpboot/pxelinux.cfg/x86_64/2.6.18mlw5-9d
/tftpboot/pxelinux.cfg/x86_64/2.6.18-mlw5-new
/tftpboot/pxelinux.cfg/x86_64/2.6.18mlw5
/tftpboot/pxelinux.cfg/x86_64/2.6.18mlw5-w-initrd
/tftpboot/pxelinux.cfg/x86_64/2.6.18-mlw5
/tftpboot/pxelinux.cfg/x86_64/2.6.23-rc8-shuttle
/tftpboot/pxelinux.cfg/x86_64/2.6.23-rc8
/tftpboot/pxelinux.cfg/default
/tftpboot/pxelinux.cfg/2.6.18mlw5
/tftpboot/pxelinux.cfg/default.rhel4
/tftpboot/x86_64
/tftpboot/x86_64/vmlinuz-2.6.18-mlw.2
/tftpboot/x86_64/vmlinuz-2.6.18-mlw.3
/tftpboot/x86_64/vmlinuz-2.6.23-rc8
/tftpboot/x86_64/vmlinuz-2.6.18mlw5.1
/tftpboot/x86_64/bzImage
/tftpboot/x86_64/vmlinuz-2.6.18mlw5
/tftpboot/x86_64/vmlinuz-2.6.18-mlw
/tftpboot/x86_64/vmlinuz-2.6.18-mlw5-new
/tftpboot/x86_64/vmlinuz-2.6.18-mlw.1
/tftpboot/x86_64/vmlinuz-2.6.18-mlw5
/tftpboot/x86_64/bzImage5
/tftpboot/x86_64/initrd-2.6.18mlw5.img
/tftpboot/x86_64/initrd-2.6.18mlw5-9d.img
/tftpboot/x86_64/vmlinuz-2.6.18mlw5-9d
/tftpboot/x86_64/initrd-2.6.23-rc8.img
/tftpboot/bzImage
/tftpboot/pxelinux.0 From a note from Lucho:

Ok, the authentication is finally working well enough to be commitable to svn branch :) I worked on it for so long, that there are probably millions of bugs. In order to use it, you need to create the admin keys in /etc/xcpu/admin_key*. xcpufs needs the public key (admin_key.pub), the program that adds new users (xuserset) needs the private one (admin_key). After you start xcpufs, use xuserset in utils directory to add a new user to xcpufs running it as:

./xuserset localhost lucho 500 admin ~/.ssh/id_rsa.pub

The parameters are:

./xuserset node-list uname uid default group file with public key

Then you can use xrx as before.

There are many things that need to be added, the most important one is to have xcpufs to add automatically all the users that have public keys in their .ssh.

If you are brave, you can try using it. I'll probably fix some more bugs while running my tests for the class project, but I won't fix more than necessary. I'll probably fix the rest in July after all the conferences are done. Ok, the authentication is finally working well enough to be commitable to svn branch :) I worked on it for so long, that there are probably millions of bugs. In order to use it, you need to create the admin keys in /etc/xcpu/admin_key*. xcpufs needs the public key (admin_key.pub), the program that adds new users (xuserset) needs the private one (admin_key). After you start xcpufs, use xuserset in utils directory to add a new user to xcpufs running it as:

./xuserset localhost lucho 500 admin ~/.ssh/id_rsa.pub

The parameters are:

./xuserset node-list uname uid default group-file-with-public-key

Then you can use xrx as before.

There are many things that need to be added, the most important one is to have xcpufs to add automatically all the users that have public keys in their .ssh.

If you are brave, you can try using it. I'll probably fix some more bugs while running my tests for the class project, but I won't fix more than necessary. I'll probably fix the rest in July after all the conferences are done.

How our overlay directory looks


etc
id_rsa.pub
lib
lib64
setup
usr
var

overlays/xcpu-64/etc:
init.d
mtab
xcpu

overlays/xcpu-64/etc/init.d:
rcS
rcS~

overlays/xcpu-64/etc/xcpu:
admin_key
admin_key.pub

overlays/xcpu-64/lib:

overlays/xcpu-64/lib64:
ld-2.3.4.so
ld-linux-x86-64.so.2
ld-lsb-x86-64.so.3
libc-2.3.4.so
libcap.so
libcap.so.1
libcap.so.1.10
libcidn-2.3.4.so
libcom_err.so.2
libcom_err.so.2.1
libcrypt-2.3.4.so
libcrypto.so.0.9.6b
libcrypto.so.0.9.7a
libcrypto.so.2
libcrypto.so.4
libcrypt.so.1
libc.so.6
libdl-2.3.4.so
libdl.so.2
libm-2.3.4.so
libm.so.6
libresolv-2.3.4.so
libresolv.so.2
tls

overlays/xcpu-64/lib64/tls:
libc-2.3.4.so
libc.so.6
libdb-4.2.so
libm-2.3.4.so
libm.so.6
libpthread-2.3.4.so
libpthread.so.0
librt-2.3.4.so
librt.so.1
libthread_db-1.0.so
libthread_db.so.1

overlays/xcpu-64/usr:
bin
lib
lib64
sbin

overlays/xcpu-64/usr/bin:
xgroupset
xk
xps
xrx
xstat
xuserset

overlays/xcpu-64/usr/lib:

overlays/xcpu-64/usr/lib64:
libcom_err.a
libcom_err.so
libcom_err.so.3
libcom_err.so.3.0.0
libelf-0.125.so
libelf-0.97.1.so
libelf.a
libelf.so
libelf.so.1
libgssapi.a
libgssapi_krb5.a
libgssapi_krb5.so
libgssapi_krb5.so.2
libgssapi_krb5.so.2.2
libgssapi.la
libgssapi.so
libgssapi.so.1
libgssapi.so.1.0.0
libk5crypto.a
libk5crypto.so
libk5crypto.so.3
libk5crypto.so.3.0
libkrb5.a
libkrb5.so
libkrb5.so.3
libkrb5.so.3.2
libz.a
libz.so
libz.so.1
libz.so.1.2.1.2
libz.so.1.2.3
libzvt.a
libzvt.so
libzvt.so.2
libzvt.so.2.2.10

overlays/xcpu-64/usr/sbin:
xcpufs

overlays/xcpu-64/var:
lib
lock
run

overlays/xcpu-64/var/lib:
nfs

overlays/xcpu-64/var/lib/nfs:
rmtab
rpc_pipefs

overlays/xcpu-64/var/lib/nfs/rpc_pipefs:

overlays/xcpu-64/var/lock:
subsys

overlays/xcpu-64/var/lock/subsys:

overlays/xcpu-64/var/run:

Problems and fixes

pxe problems

"I am having a devil of a time PXE-booting the nodes. I have done this many times, but this time I am stumped. The master machine is a RHEL5 installation. There are no netboot tools for this version, so I got some from Fedora 6. However I can't get the damn thing to boot."

"The nodes read the first file that is offered via tftp, apparently correctly, but then they get stuck trying to get the pxelinux.cfg/default (or whatever) file read. I even tried getting the nodes to boot gPXE first, in case the builtin PXE is bad, but then they read the new gPXE and get stuck trying to read the pxelinux.0 file."

Fix: "I've seen that. The pxelinux bootloader that ships with RHEL5 is horribly broken. Go to the syslinux page and download the latest one, then stick that pxelinux.0 file in your tftpboot directory. It should make things work a lot faster."

Response: "Thanks, Josh, it worked!"

Problems at startup

"It seems to boot just fine, and ends with setting up the network. I can ping the node, but the rest of the stuff was not executed. I guess now the issue is to refine the scripts that it needs to run, check the initrd, etc......."

Comment: "Hook up a serial console to one node and see if you can do a ps etc. and see what's running."

"A good test, before you even build the image, is this: cd overlays/xcpu-64
chroot . /usr/sbin/xcpufs
And make sure that works.

"You have the script in etc/init.d/rcS, and it has x bits set, right?"

The node comes up, but no xcpufs -- it just returns

"There is progress! The node seems to boot allright, and I can interact with it over the serial console. I can run any number of usual things, like ps, ls, cat proc/cpuinfo, etc. However... xcpufs is NOT running. If I try to just run it on the command line it simply returns to the command prompt. The files all seem to be there. No error messages (also no /var/log/messages)."

Response: "It is detaching and running in background, and probably immediately dying. "
If you do xcpufs -d -D 255
what happens?"

"Exactly the same... Dies and no messages or anything."

Response: "put strace in your overlay tree and strace -f xcpufs, send output to me."

"Ok, more progress: Trying the "chroot . /usr/sbin/xcpufs" line pointed to missing libraries (actually just the soft links: libc.so.6, libdl.so.2, libz.so.1, libcrypto.so.6).


After adding these to the overlays tree, xcpufs started properly under the chroot.
Similarly - after rebuilding the initrd image - the node booted up, and this time it did execute xcpufs (and it stayed running).

Error: unknown user


xrx n0 date
returns: Error: unknown user
I am so far doing everything as root.

Response: "now you are hitting the security feature of xcpufs. I mean this seriously. xcpufs has a very good security model.

When xcpufs starts up, it has no built-in users. You need a working xuserset and xgroupset to tell xcpufs what users are allowed.

Note one cool thing: you can run without a root user! This feature is not yet used much but it rocks.

try this:
xgroupset node root 0
then
xuserset node root 0 root ~root/.ssh/id_rsa.pub
OR,

can you do this test first: cd to your overlay directory again.
chroot . /bin/xuserset
This will ensure you are missing no other libraries.
"Should have thought of this myself... Same problem as with xcpufs, i.e. missing the softlinks for libm.so.6 and libelf.so.1.
I am redoing the initrd and rebooting the node yet again...

No library?


Ok, so far so good. It booted and then I got a problem with libraries again, but that was when trying "xrx n0 date" - it wanted librt.so.1.
I guess the basic set of libraries are still required. Was xrx not supposed to drag libraries along with it if necessary?"
xrx -s n0 date
Although the newer one does this automagically.

dhcp did not offer a hostname


"When the node boots, oneSIS "complains" that dhcp did not offer a hostname. Is this a problem?"

Response: "Not a problem, but you could get rid of the warning it by adding:
use-host-decl-names on;
to your dhcpd.conf file (assuming you have a bunch of host blocks and are not just handing out from a range of IPs)."

No route to host

"[root@prism-admin ~]# xrx -p -s 'pn[1-84]' date Error: No route to host"

This is a bug in xcpufs which we are working on.