02
Nov 17

fork()ing and fstat()ing in JRuby using FFI on linux

Sometimes, $DAYJOB can get kindof technical. For reasons I won’t go into here because NDA, the following axioms are true for this puzzle:

  • we have to work in JRuby
  • we are in a plugin within a larger framework providing a service
  • we have to restart the entire service
  • we don’t have a programmatic way to do so
  • we don’t want to rely on external artifacts and cron

Now, this isn’t the initial framing set of axioms you understand; this is what we’re facing into after a few weeks of trying everything else first.

So; obvious solution, system('/etc/init.d/ourService restart').
Except that JRuby doesn’t do system(). Or fork(), exec(), daemon(), or indeed any kind of process duplication I could find. Oh-kay, so we can write to a file, have a cronjob watch for the file and restart the service and delete the file if it finds it. Except that for Reasons (again, NDA), that’s not possible because we can’t rely on having access to cron on all platforms.

Okay. Can we cheat?

Well, yes… allegedly. We can use the Foreign Function Interface to bind to libc and access the functions behind JRuby’s back.

require 'ffi'

module Exec
   extend FFI::Library

   attach_function :my_exec, :execl, [:string, :string, :varargs], :int
   attach_function :fork, [], :int
end

vim1 = '/usr/bin/vim'
vim2 = 'vim'
if Exec.fork == 0
   Exec.my_exec vim1, vim2, :pointer, nil
end

Process.waitall

Of course, I’m intending to kill the thing that fires this off, so a little more care is needed. For a start, it’s not vim I’m playing with. So…

module LibC
   extend FFI::Library

   ffi_lib FFI::Library::LIBC

   # Timespec struct datatype
      class Timespec < FFI::Struct
      layout :tv_sec, :time_t,
      :tv_nsec, :long
   end

   # stat struct datatype
   # (see /usr/include/sys/stat.h and /usr/include/bits/stat.h)
   class Stat < FFI::Struct
      layout :st_dev, :dev_t,
             :st_ino, :ino_t,
             :st_nlink, :nlink_t,
             :st_mode, :mode_t,
             :st_uid, :uid_t,
             :st_gid, :gid_t,
             :__pad0, :int,
             :st_rdev, :dev_t,
             :st_size, :off_t,
             :st_blksize, :long,
             :st_blocks, :long,
             :st_atimespec, LibC::Timespec,
             :st_mtimespec, LibC::Timespec,
             :st_ctimespec, LibC::Timespec,
             :__unused0, :long,
             :__unused1, :long,
             :__unused2, :long,
             :__unused3, :long,
             :__unused4, :long
   end

   # Filetype mask
   S_IFMT = 0o170000

   # File types.
   S_IFIFO = 0o010000
   S_IFCHR = 0o020000
   S_IFDIR = 0o040000
   S_IFBLK = 0o060000
   S_IFREG = 0o100000
   S_IFLNK = 0o120000
   S_IFSOCK = 0o140000

   attach_function :getpid, [], :pid_t
   attach_function :setsid, [], :pid_t
   attach_function :fork, [], :int
   attach_function :execl, [:string, :string, :string, :varargs], :int
   attach_function :chdir, [:string], :int
   attach_function :close, [:int], :int
   attach_function :fstat, :__fxstat, [:int, :int, :pointer], :int
end

So that’s bound a bunch of libc functions for use in JRuby. But why __fxstat() instead of fstat()? Interesting detail; the stat() function family aren’t in libc, at least not on most modern linux platforms. They’re in a small static library (libc_unshared.a in centOS). There’s usually a linker directive that makes that transparent but here we’re acting behind the scenes so we don’t get that niceity so we directly access the underlying xstat() functions instead.

I need to close some network ports (or the restart goes badly because the child process inherits the ports’ file descriptors and someone didn’t set them to close on exec()). A small helper function is useful here:

# Helper function to check if a file descriptor is a socket or not
def socket?(fd)
   # data structure to hold the stat_t data
   stat = LibC::Stat.new

   # JRuby's IO object types can't seem get a grip on fd's inherited from
   # another process correctly in a forked child process so we have
   # to FFI out to libc.
   rc = LibC.fstat(0, fd, stat.pointer)
   if rc == -1
      errno = FFI::LastError.error
      false
   else
      # Now we do some bit twiddling. In Octal, no less.
      filetype = stat[:st_mode] & LibC::S_IFMT

      if filetype == LibC::S_IFSOCK
         true
      else
         false
      end
   end
rescue => e
   false
end

And now the actual restart function itself:

def restart
   pid = LibC.getpid
   rc = LibC.chdir('/')
   if rc == -1
      errno = FFI::LastError.error
      return errno
   end

   # close any open network sockets so the restart doesn't hang
   fds = Dir.entries("/proc/#{pid}/fd")
   fds.each do |fd|
      # skip . and .. which we pick up because of the /proc approach to
      # getting the list of file descriptors
      next if fd.to_i.zero?

      # skip any non-network socket file descriptors as they're not going to
      # cause us any issues and leaving them lets us log a little longer.
      next unless socket?(fd.to_i)

      # JRuby's IO objects can't get a handle on these fd's for some reason,
      # possibly because we're in a child process. So we use libc's close()
      rc = LibC.close(fd.to_i)
      next if rc.zero?
      errno = FFI::LastError.error
      return errno
   end

   # We're now ready to fork and restart the service
   rc = LibC.fork
   if rc == -1
      # If fork() failed we're probably in a world of hurt
      errno = FFI::LastError.error
      return errno
   elsif rc.zero?
      # We are now the daemon. We can't hang about (thanks to 
      # JRuby's un-thread-safe nature) so we immediately swap out our 
      # process image with that of the service restart script. 
      # This marks the end of execution of this thread and there is no return.
      LibC.execl '/etc/init.d/ourService', 'ourService', 'restart', :pointer, nil
   end
rescue => e
# Handle errors here (removed for clarity)
end

An interesting problem to solve, this one. And by “interesting” I mean “similar to learning how to pull teeth while only able to access the mouth via the nose”. But in case it’s of use to someone…


27
Jul 11

Home server build, part one – specifications

So between my laptop and herself’s, we have a fair amount of valuable (to us) data – MSc essays and coursework, book manuscripts, half a gigabyte of open source projects, Phd programming work, wedding photos and video, and about 19 gigabytes of other photos and video, eight gigabytes of target shooting documents and images, half a gigabyte of academic papers… well, you get the idea. So when my laptop hard drive started to hiccup and its SMART report started complaining of bad blocks and imminent failure within 24 hours… well, it prompted some concern 🙂 Most of the important data was backed up on my server (which is not just off-site, but out of the country) using rsnapshot, but there’s nothing like an incipient disaster to make you review your disaster recovery protocols 😀

Besides, I had been planning for some time to offload the bulk data storage (video files and so forth) to a NAS, and while buying an off-the-shelf NAS box is certainly an option:

  • building your own can give you more capability for less outlay;
  • building your own allows you more functionality than just NAS storage – in this case, I had a few other tasks in mind for this box;
  • building your own is something every sysadmin should do for CPD if nothing else. 🙂

So, what’s the specification? The list of tasks is fairly straightforward to start with:

  • NAS storage
  • Central print server (and scanner at some point)
  • Backups of both laptops and off-site storage of those backups
  • Central downloading server for bittorrents and so on
  • Media server

None of these need much in the way of CPU oomph, though we would need multiple cores as the storage will be some form of software RAID array (and having multiple cores would give more performance than a faster single-core, at least for a given amount of outlay). Not a huge amount of RAM is needed either. But we do need a number of disk interfaces, gigabyte ethernet (and the central router for the network has since been upgraded from the standard ISP’s Zyxel to a gigabyte ethernet Netgear router), and if this is all on the motherboard, so much the better.

And obviously, outlay’s an issue as well. Buying an off-the-shelf NAS box (a Synology DS411+) and stocking it with disks (4x Hitachi 2Tb Deskstar drives) would cost approximately €960 (priced on scan.co.uk) so the goal was to get in below that threshold.

First off, the CPU. I’m going with the AMD Athlon II X4 645. It’s a quad core processor but quite cheap. It’s a socket AM3 processor, which leads us to the selection of the motherboard, and I’ve chosen a Gigabyte GA-880GA-UD3H, which is the cheapest Gigabyte AM3 board which had six SATA ports and Gigabyte ethernet ports. Add in a fairly cheap 4Gb of RAM (and pause to remember the time back in ’97 just after graduation when we watched with some degree of awe when the TCD sysadmins showed off a whole gigabyte of RAM, which cost about three months disposable income…) and a fairly standard Artic Cooling Freezer 7 for the CPU and that’s the guts of the thing.

Now for storage. The idea here is a degree of future-proofing, and a lot of I-never-want-to-lose-any-data-or-have-much-downtime 😀 So, the OS is going on two 320Gb Seagate Barracuda hard drives in a RAID 1 array. The main data storage will go on four 2TB Hitachi CoolSpin 5K3000 drives, which were chosen after this particularly excellent article from Backblaze about their 135Tb storage pods. And to try to keep stable power lines, an 850W Coolermaster modular PSU. And since this rig will have to last for a while and I hate slicing my hands open on cheap cases, a Lian Li PC-8NB case to mount all of this in, together with an Icy Box IB554SK SATA RAID frame because when the hard drives fail (and all drives will), I don’t want to have to disassemble the entire box to fix the problem (though, yes, if the OS drives I might have to, but you could always add another Icy Box or similar frame – I was watching the outlay myself). And then just to top it off, a DVD RW drive to make installation a bit easier.

[stextbox id=”info”]For those who’ve been asking on Hacker News, yes, the PSU here is over-specified. That’s a deliberate choice, because I’d rather come nowhere near the limits (or even the 50% mark if possible) of the PSU for two reasons – stability of voltage lines and cooling. And yes, the CPU could be an Atom, but I’ve chosen instead to go with something a bit more conventional and old and unfashionable and debugged because this is an infrastructure box and I’m willing to pay a few euros more upfront to avoid spending a week of my time trying to fix it in a years time, or having to buy more new hardware because it turned out we needed it to do something more than it can do right now and an Atom couldn’t cut it. And yes, 320Gb hard drives. Actually, I changed my mind and went with the 500Gb ones after writing this post; they turned out to be 6 pence cheaper than the 320GB model, which I had thought was the cheapest available hard drive not made by McFlakey Inc. They’re not an example of overspecification, they’re an example of consumer electronics pricing…[/stextbox]

So the full list:

Icy Box IB-554SKIcy Box IB-554SK£83.87
Corsair DDR3 XMS Classic4GB (2x2GB) Corsair DDR3 XMS3 Classic, PC3-10666£25.68
Artic Cooling Freezer 7 Pro Arctic Cooling Freezer 7 Pro v2£18.74
850W Coolermaster Silent Pro850W Coolermaster Silent Pro M£109.02
Lian Li PC-8NBLian Li PC-8NB£83.98
Gigabyte GA-880GA-UD3HGigabyte GA-880GA-UD3H£79.98
AMD Athlon II X4 645AMD Athlon II X4 645£77.51
Pioneer DVR-S19LBKPioneer DVR-S19LBK 24x DVD±R£17.92
Seagate Barracuda 320Gb320GB Seagate ST3320413AS Barracuda£57.31
Hitachi Coolspin 5K3000 2Tb2TB Hitachi 0F12117CoolSpin 5K3000£209.95
Total£788.62 (€~€895)

So that’s under the target price, so the order went in on Scan this weekend and is due for delivery in a day or so…

(continued in part two)


20
Jul 10

My interviews at Google

Google LogoSo I’ve now completed the interview process twice with Google (once in 2007 and once in 2010), and while I’m not sure advice from someone not hired after two run-throughs is all that useful, I figured that the more information out there for those undergoing pre-Google-Interview stress, the better, so here’s how it went.

In both cases, I was contacted out of the blue by a Google recruiter. The first time I had been considering looking for a new role and pursued it immediately; the second time I hadn’t been and put off the recruitment process for several months, during which the same recruiter contacted me again twice to follow up. If nothing else, that’s a nice ego boost, but a more cynical mind might be considering the shotgun approach to a narrow recruiting filter and commissions 😀

First, a quick data point, I was applying for an SRE(SA) position on both occasions – Site Reliability Engineer (System Administration), because in most of my roles to date, I’ve been doing both sysadmin and development work and I’ve never seemed to drift towards one pigeonhole or another. SRE(SA) seemed optimal – interesting sysadmin work on large-scale systems and quite a bit of tool-writing to boot. This was decided on between myself and the recruiter, based on the self-assessment form you are given to fill out. I would love to know how they get around illusory superiority and the Dunning-Kruger effect with those forms, especially given the wierd bias they’d have in the dataset from having so many of the best in their fields working there.

Both times, the process proceeded in the same way: Continue reading →