Contents ... First ... Back ... Next ... Last

Grid Weaknesses

Actual and theoretical weaknesses, discovered and undiscovered. General classes of weaknesses:

  1. Security holes in Globus and other standard tools
  2. Improper use of certificate server(s) or other trust networks
  3. Security holes in the systems running grid software (that is, vanilla OS weaknesses)
  4. Vulnerabilities in grid-based applications
  5. Upgrade avoidance and improper/incomplete upgrades

More detail:

  • Globus holes: Globus has been updated frequently, often due to potential security problems. Known exploits have been few or none, perhaps because relatively few sites use Globus (commpared to, say, sendmail or IIS). A major factor in favor of Globus is its basis in Java, which is less succeptible to buffer overflows and similar exploits that have plagued applications written in C/C++. With over 1/4 million lines of code in GT3.3.0, there are sure to be undiscovered/unknown weaknesses, though not all will be exploitable remotely, and even fewer will provide root-level access to the underlying OS.
  • Rebuilding Globus can take hours, even on a fast system, and essentially no regression tests are available to insure a rebuild will work (or is, in fact, a remedy for a particular problem).
  • Next-gen Globus 4.0 should be better, because Globus will shrink in favor of building on Tomcat and the other Web services components (also WebSphere, & others)
  • Certificate servers, kerberos tickets, ssh keys, and other necessary elements of the grid security infrastructure are subject to general exploits (such as kerberos ticket hijacking and problems with ssh upgrades), as well as improper configurations. Because there is no central certificate server for places wanting to use grid services (after all, sites should use their own, to form their own VOs), everyone needs to build or buy their own. But many sites deploy certificate servers improperly, or put othher potentially weak services on the CS.
  • OS weaknesses were reportedly the basis for the TeraGrid's publicized breakins in the spring. These are essentially outside the scope of Globus + grid, and can be avoided mainly by following excellent security practices for hardening, monitoring, firewalling, updating patches, etc.
  • Application vulnerabilities. Since essentially any application in any language can be "grid service"-enabled, vulnerabilities in these applications could themselves be vulnerable, even leading to exploits to the underlying grid software or OS. This problem is similar to letting users employ CGI, PHP, Web services, etc. on a public server - who do you trust? GS, like WS tries to keep applications in an unprivileged sandbox, but the boundaries are often not to rigid. Furthermore, intentional weaknesses (such as back doors) might go unnoticed. This problem will only get worse as the software stabilizes and more people start sharing the same applications on the same Globus implementations. Such common applications are likely to be low hanging fruit for would-be intruders, much as in the early days of httpd.
  • Fear of upgrades. Getting Globus running is a major hassle, and often is tied in hard-to-determine ways to underlying software and versions (JDK, ant, & the many components of Globus itself). Utilization of system libraries such as OpenSSL might be unknown or opaque. Previously, many sites used RedHat Linux, but their end of support for 7.3-9.0 has left systems harder to update. Plus, a grid server farm might have any number of systems, with the expected opportunities for forgetting, postponing or avoiding a critical fix. The good news is that it is possible to maintain tight systems security, with up to date patches, appropriate monitoring, etc. The bad news is that many sites have been unwilling to devote the needed energy and resources, and in the absence of a real Globus test suite are concerned about breaking running applications through upgrades, even if they are done carefully. This is a perpetual problem, and the TeraGrid sites learned the hard way that doing the extra labor to stay on top of patches and make best-possible efforts at handling change management was a requirement.

Contents ... First ... Back ... Next ... Last