wiki:TroubleShootingComputers

Version 3 (modified by Daniela Dorner, 10 years ago) ( diff )

added link to trouble shooting software

Trouble Shooting - Computers

Information about the Computers

You can find information about the Computers on La Palma on the computing page.

Startup Procedure

  1. bring up GATE (LDAP, DNS, DHCP, Gateway (Masquerading))
  2. bring up NEWDAQ (NFS Home, Raid)
  3. bring up DATA
  4. bring up DAQ
  5. bring up the other machines
  6. make sure that the needed mountpoints are there
  7. make sure that the needed screen-sessions are running (details TroubleShootingSoftware)

mountpoints:

  • daq, data: /newdaq from newdaq
  • data, gate: /daq from daq
  • gate: /users from newdaq (home of other machines)
  • data,daq,aux,gui: /home from newdaq

If missing, do sudo mount -a on the corresponding machine.

Shutdown Procedure

  1. shut down aux, gui
  2. shut down daq
  3. shut down data
  4. shut down newdaq
  5. show down gate

Restart a computer after a power cut

You can switch on the computers from 10.0.100.234 (see http://fact-project.org/internal.html).

Restarting a hanging PC

Symptom

  • the PC can't be reached per ssh, or something similar
  • be aware, that when all computers (except for gate) seem to hang, it is normally newdaq which hangs, the other only try to mount the home from the raid of newdaq, so they hang too

Solution

If it's not to late in the night, try to call an expert before you power cycle the computers.

When you have to restart more than one PC, be sure you follow the Shutdown and Start-up procedure above.

You can switch on the computers from 10.0.100.234 (see http://fact-project.org/internal.html)

or you can power cycle the hanging computer from any other computer on the FACT internal network:

  • go in /usr/local/bin
  • execute one off the following scripts: aux_off, gui_off, gate_off, daq_off, data_off
  • wait a few minutes
  • execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON
  • Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively

or power cycle the hanging computer manually from the FACT-container.

Note: See TracWiki for help on using the wiki.