wiki:TroubleShootingComputers

Version 12 (modified by mnoethe, 7 years ago) ( diff )

--

Trouble Shooting - Computers

Information about the Computers

You can find information about the Computers on La Palma on the computing page.

KVM Software

It can be helpful to directly look onto the screen output, install the RARITAN KVM Software to do that

Restart a computer after a power cut

You can switch on the computers and the KVM switch from the power switches listed on the internal links page.

Startup Procedure

  1. bring up GATE (LDAP, DNS, DHCP, Gateway (Masquerading))
    1. It might be needed to restart the VPN service
      systemctl status openvpn@fact.service
      systemctl restart openvpn@fact.service
      
  1. bring up RAID
  2. bring up NEWDAQ (NFS Home, Raid)
  3. bring up NEWDATA
  4. bring up AUX
  5. make sure that the needed mountpoints are there
  6. make sure that the needed screen-sessions are running (details Trouble Shooting Software)

mountpoints:

  • newdata: /newdaq and /home from newdaq
  • gate: /users from newdaq (home of other machines)
  • aux: /home from newdaq

If missing, do sudo mount -a on the corresponding machine.

Shutdown Procedure

  1. shut down aux, gui
  2. shut down daq
  3. shut down data
  4. shut down newdaq
  5. show down gate

Restarting a hanging PC

Symptom

  • the PC can't be reached per ssh, or something similar
  • be aware, that when all computers (except for gate) seem to hang, it is normally newdaq which hangs, the other only try to mount the home from the raid of newdaq, so they hang too

Solution

If it's not to late in the night, try to call an expert before you power cycle the computers.

When you have to restart more than one PC, be sure you follow the Shutdown and Start-up procedure above.

You can switch on the computers from 10.0.100.234 (see http://fact-project.org/internal.html)

or you can power cycle the hanging computer from any other computer on the FACT internal network:

  • go in /usr/local/bin
  • execute one off the following scripts: aux_off, gui_off, gate_off, daq_off, data_off
  • wait a few minutes
  • execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON
  • Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively

or power cycle the hanging computer manually from the FACT-container.

What to do if daq is dead?

If daq dies, the qla and few other scripts have to be moved to newdaq. Find some notes here: DaqDead

Note: See TracWiki for help on using the wiki.