July 18 2018: Directory Difficulties

Today one of our file servers had a drive fail. The normal failover process worked, but unfortunately took long enough to confuse a bunch of machines.

The usual symptom is that directories (home directories, project directories) will not attach. Often your login session will just hang.

Unfortunately, this takes a hard reboot to fix reliably. We are checking a bunch of the open login and compute servers (Wednesday afternoon), and rebooting as we can.

Jan 4 2018: Hardware Vulnerability Patching Schedule

In the last few days rumors of terrible, CPU-level security vulnerabilities have been appearing in the tech news. Last night the embargo on details was broken, and it's quite a mess. The BCG will need to patch all machines in the department, probably more than once, to address the problems.

Over the next week, please:

  1. Log out if you are away from a machine more than a few hours (including remotely). This lets us patch machines when we see no one is logged into them.
  2. Please avoid long-running compute jobs, either directly or through condor. This will minimize work lost when we reboot a machine.

One of the two vulnerabilities cannot be fully solved short of replacing the CPU. The patches for that are work-arounds which try to minimize the risk of the vulnerability. These patches do degrade the performance somewhat, from nearly 20% for certain kinds of database tasks, to more modest 3-5% hits for purely computational work. What the hit will be like for average, daily workloads is not yet clear.

Patches are already available for all three of our platforms: Windows, MacOS, and Linux. We have already begun to apply patches on free machines. As firmware updates become available BCG staff will need to visit people's desktop machines and spend time with laptops.

Power Outage at CSC - 11/29/17

On 11/29/17 at approximately 4:00AM - 6:00AM, all department desktops located in the CSC were impacted by brief power outage due to maintenance by hospital staff. This means all desktops were restarted, and any unsaved work would have been lost.

We have addressed the majority of desktops and issues caused by this outage, however if you see any abnormal behavior please contact the BCG.

Network Outage - 10/13/17

At approximately 11:30AM on 10/13, we were made of aware of network related issues effecting the BMI department file and compute servers. The issue was a result of failed network updates made by UW DoIT, and has since been resolved (by ~1:00PM).

This effected all Linux users, and network shared drives for Mac and Windows users. If you continue to have issues after 1:00PM, please try restarting your desktop. If this does not resolve it, please contact BCG support.

Power Outage at CSC Saturday Sept 30 3AM-5AM

There is going to be a power outage 3AM - 5AM in the CSC which will affect all BMI offices located there. It's asking for trouble to let a computer be rudely cut off from power without a chance to politely shut down, so before you leave for the weekend, please shut down your desktop computers.

The CSC server room is on backup generator power, so the servers should be unaffected.

Wed, July 19, 2017: UW Campus Network Outage [RESOLVED]

Starting on July 18th at 8PM, the entire UW campus has been experiencing widespread network outages that are effecting many different buildings and departments including the SMPH. Any desktop or server connected to the Biostatistics and Medical Informatics network may have internet or connection issues preventing any outside connection. There are reports that campus WiFi currently works in most areas.

Wed, June 21, 2017: Linux machine security issue and reboots

To address an extremely dangerous and widespread vulnerability, we will be patching all the linux machines in the next few days.

This patch requires a reboot.

Condor users will see a slightly higher rate of job restarts as we patch and reboot the condor-only hosts. We will try to schedule patch times with people for their desktop machines, but we don't want to spend more than a few days on getting every computer patched.