BCG Status Updates

Wed April 26, 2017, 11:01am - SSH from off campus and VPNs failing [RESOLVED]

After a switch on April 10th to a new firewall for the Med School, we noticed that idle ssh sessions were being killed after very short times. This morning, the SMPH Network Group made a change to reduce that timeout, but now we are seeing incomplete ssh negotiation for new ssh sessions, from off campus and from VPNs. Within the Biostat network, ssh is fine.

The SMPH Network Group is aware of this and debugging the issue now. We'll update this status page when we know more.

May 2 2017: Resolved. Click here for additional information on how to work around this issue.

Mon, Feb 6, 2017, 1:00pm: File Server Outage [Resolved 10:17pm]

Access to the main file server is intermittent. We're working on it now.

10:17pm Update: File system services to user home directories and project directories are now working again.

There will need to be a scheduled outage (perhaps more than one) to deal with additional clean-up and adjustments. The system might be a bit slower than usual for a few days.

Most machines have recovered on their own from the outage, but a few may be so confused they need a reboot. We'll be checking hosts, but please let us know if you run into any stuck machines.

File Server Outage Scheduled for 9-11pm, Jan 16th

There will be an outage to do hardware maintenance on both the user home directory (/ua/, /z/Proj/) as well as the computational file server (/z/Comp/). The outage will last from 9pm to 11pm on Monday, January 16th.

You should not run Condor jobs during this outage.

File Server Outage Scheduled for Tue, Sep. 13, 10:00pm

The main home directory file server will be down for two hours Tuesday evening to install urgent software updates.

Some project directories are on the same file server, and we recommend against scheduling Condor jobs through the outage window.

Update, Sep 14: all upgrades were successfully installed.

June 29 2016: New version of R available, 3.3.1.

The most recent version of R, 3.3.1, is available with the command R331.

It will be come the default version of R (when you just type R) on July 18th.

Critical File Server Upgrade, Wed, March 16, 9:30pm-12:00

As many of you have doubtless noticed, the file server holding your home directories has been misbehaving in recent weeks. To address that, we'd like to take down the file server for hardware maintenance the evening of Wednesday, March 16th, from 9:30pm until midnight.

During the outage your home directory will be unavailable and you'll not be able to log in. Your email, as a DoIT service, will be unaffected by the outage.

We also recommend you not have any CONDOR jobs scheduled during that time.

Jan 15 2016: Python library updates and Git

The Python deep learning support library Theano has been added to Python 2 and 3.

Since the last time we checked, several additional scientific packages have been ported to Python 3 (matplotlib, in particular). So, for the moment, Python 2 and 3 have parallel scientific libraries. Details.

We have recently installed an authenticated, web-managed Git repository tool, available to all Biostat users. Details.

Jan 11 2016 : Project and Compute file server down [Updated 6:44pm]

Status: (6:44pm) the compute file server is back up.

Due to a lengthy power outage in one of our server rooms, a large number of compute machines, and one main file server, were offline since Sunday.

The compute machines are mostly back now.

It seems there was a power surge when the power came back on, which has caused problems for the project/compute file server. We're currently working (9:30am) to move the drives to a different chassis and bring that service back online. I'll update this page when that service is back.

Note that replication, which provides the backup copies of that server, will not be happening for at least a few days while we replace the damaged hardware for the server.

The main file server which hosts people's home directories and many project directories has been unaffected by this outage. The problems only apply to high-compute storage space.

As of this evening, we have the compute file server back up and serving files. As I say above, there will be no replication for a few days, until we can either fix or replace a motherboard on the replication host.

Throughput (network speed) is going to be a bit slow until some time tomorrow (Jan 12).