annis's blog

Jan 4 2018: Hardware Vulnerability Patching Schedule

In the last few days rumors of terrible, CPU-level security vulnerabilities have been appearing in the tech news. Last night the embargo on details was broken, and it's quite a mess. The BCG will need to patch all machines in the department, probably more than once, to address the problems.

Over the next week, please:

  1. Log out if you are away from a machine more than a few hours (including remotely). This lets us patch machines when we see no one is logged into them.
  2. Please avoid long-running compute jobs, either directly or through condor. This will minimize work lost when we reboot a machine.

One of the two vulnerabilities cannot be fully solved short of replacing the CPU. The patches for that are work-arounds which try to minimize the risk of the vulnerability. These patches do degrade the performance somewhat, from nearly 20% for certain kinds of database tasks, to more modest 3-5% hits for purely computational work. What the hit will be like for average, daily workloads is not yet clear.

Patches are already available for all three of our platforms: Windows, MacOS, and Linux. We have already begun to apply patches on free machines. As firmware updates become available BCG staff will need to visit people's desktop machines and spend time with laptops.

Wed, June 21, 2017: Linux machine security issue and reboots

To address an extremely dangerous and widespread vulnerability, we will be patching all the linux machines in the next few days.

This patch requires a reboot.

Condor users will see a slightly higher rate of job restarts as we patch and reboot the condor-only hosts. We will try to schedule patch times with people for their desktop machines, but we don't want to spend more than a few days on getting every computer patched.

Wed April 26, 2017, 11:01am - SSH from off campus and VPNs failing [RESOLVED]

After a switch on April 10th to a new firewall for the Med School, we noticed that idle ssh sessions were being killed after very short times. This morning, the SMPH Network Group made a change to reduce that timeout, but now we are seeing incomplete ssh negotiation for new ssh sessions, from off campus and from VPNs. Within the Biostat network, ssh is fine.

The SMPH Network Group is aware of this and debugging the issue now. We'll update this status page when we know more.

Mon, Feb 6, 2017, 1:00pm: File Server Outage [Resolved 10:17pm]

Access to the main file server is intermittent. We're working on it now.

10:17pm Update: File system services to user home directories and project directories are now working again.

There will need to be a scheduled outage (perhaps more than one) to deal with additional clean-up and adjustments. The system might be a bit slower than usual for a few days.

Most machines have recovered on their own from the outage, but a few may be so confused they need a reboot. We'll be checking hosts, but please let us know if you run into any stuck machines.