Archive for April, 2006

I am a member of the Phoenix Linux Users Group or PLUG, and at our last meeting Google gave a presentation on how Linux is used at Google. Vince and Pat explained what Linux is used for and many of the challenges they have faced in pushing the Linux envelope.

While Google has many engineers using Linux, only a few have much experience with Linux. When asked though, they said that there were “not many” Windows machines being used by the engineers.

Managing the Network

The set-up of their internal network is similar in concept to a college computer lab. while most college computer labs may have just one – or at most several – locations that are close to each other or on the same campus, Google has employees all over the world and they are all on the same network. Engineers that live on different continents work together on projects using the Google Enterprise Network.

All of the computers on the network receive an automatic installation and all of the resources are remotely mounted. Directories such as AFS, CiFS and NFS are managed by administrators. They get a ‘custom’ debian install with Red Hat Kickstart. Updates are automatic and every computer must ‘call home’ daily with a status report that goes into a central repository that tracks ‘out-of-date’ machines. This allows the admin’s to quickly look up what machines need updating with exact information on what they need.

In order to manage the network update process, they have a ‘Test’ network that most of the Google employees do not have access too. Vince explained that with all of the ultra-modern equipment they use they make sure to always use the latest version of the Kernel. Even though bugs and such arise in the network, it creates much less work going forward by having tested updates before implementing them on the network.


It seems that NFSv3 is not very secure because it uses the sun rpc layer using auth_sys. The client uses lists of groups to aid in its query. With NFSv4, they were able to add security by creating a kernel ‘oops’ when the client’s Kerberos tickets expire. They use the rpc.gssd deamon to close and re-open all of the kernel pipes, thus triggering the ‘oops’ from the kernel.

Refreshing Kerberos tickets took a little time to figure out. By modifiying the pam_krb5 to automatically refresh allowed the applications to know that the home directory could go away, reminding them to back-up. Even with that done though, Kerberos tickets are hard coded to the /tmp directory and are vulnerable to physical attack.

Users logging on and newly created users have to wait longer because the program has to read the entire database. Package installation is slow but most of their software is installed remotely. When Vince brought that one up we all looked at him and said “join the club” and laughed.

It seems that they came up with what they described as an ‘ugly hack’ for POSIX. It makes glibc use the local cache (ncsd) but it is buggy and cannot help with the initial hit. They have two options when giving access to the local devices to the user, 1. Red Hat pan_console which gives the user access but does not support more than one user on a machine. 2.Debian groups, which adds the users to many groups and gets them to the NFS quickly. Neither of them is as secure as Google would like.

As you can see because of the unique shape and purpose of Google’s network they have pushed and pulled on parts of Linux in ways that were never dreamed of. Hopefully they will figure it out, and tell us how they did it along the way.



Read Full Post »