wiki:TroubleShooting

Version 1 (modified by ftemme, 11 years ago) ( diff )

--

Trouble shooting

TroubleShootingAutomaticFailureHandling

TroubleShootingBias

  • bias disconnection
  • Overcurrent status
  • Status notReferenced

TroubleShootingFads

  • crate reset
  • start up connection problem
  • in-run fad loss
  • drs underflow problem
  • startup - no proper connection problem
  • fadctrl hangs in state configuring by taking an external lp run

TroubleShootingFtus

TroubleShootingDrivectrl

TroubleShootingComputers

TroubleShootingArduino

TroubleShootingHardware

General Remarks

Stopping Programs

  • never stop a not hanging program with ctrl+c
    • when a restart of the program is really necessary use .q instead
    • restarting a program is in most cases not a solution and only increase the risk to trigger more problems. So avoid restarting programs as long as possible

### bias disconnection

Symptom

  • biasctrl is in state DISCONNECTED

Solution

  • Do _RECONNECT_
  • Send a command like _REQUEST_STATUS_
  • Sometime the bias crate will disconnect again, do _RECONNECT_

Not Helping

  • do not close or kill biasctrl

### OverCurrentStatus

Symptom

  • when biasctrl ramped the voltage, it get in the state OVERCURRENT

Solution

  • First try biasctrl/RESET_OVER_CURRENT_STATUS (maybe a few times)
  • if it don't help try to ramp the voltage down
    • biasctrl/SET_GLOBAL_DAC 0

Not Helping

  • do not close or kill biasctrl

### Status notReferenced

Symptom

  • biasctrl is in state notReferenced

Solution

  • start the Ramping again
    • biasctrl/START

Not Helping

  • do not close or kill biasctrl

---

<A NAME="tsfads"/> FADs


### start up - connection problem

Symptom

  • problem occurs usually during start up
  • after FAD_CONTROL/START or pushing FAD -> START button in the GUI, not all 40 FAD LEDs are green
  • fadctrl is in state Connecting (instead of Connected)

Solution

  • stop dimscripts
    • dimctrl --stop from a bash
  • do a crate reset:

<A NAME="tscratereset"/>

There exist a script which will do the Crate Reset automatically:

Crate Reset manually:

  • disable all FTUs (in the FTU tab of the GUI)
  • disconnect the 10 FADs in the crate, by clicking the LEDs
  • ftmctrl/RESET_CRATE x (x corresponding crate)
  • enable all FTUs (in the FTU tab of the GUI)
  • reconnect the 10 FADs, one by one with 3 seconds waiting in between (the FAD needs this time to boot)

Not Helping

  • disconnect / reconnect to the FAD.
  • waiting
  • reset other / all crates (might just create another of these bugs in another FAD)
  • stopping or killing fadctrl
  • power cycling the camera (might just create another of these bugs in another FAD)

<A NAME="tsfadloss"/> ### In-run-FAD-loss

Symptom

  • during a run:
    • 4 adjacent *strange* patches in the events tab of the GUI
    • orange warnings in fadctrl-console (eventbuilder realises, that one FAD stopped sending data)
    • trigger rate drop to 0
  • when the run ended and a new one is started
    • MCP hang in state Configuring3
    • fadctrl hang in state Configuring2

Solution

  • find out which FAD board is disconnected
    • the one board in the FAD tab which has the strange behaviour
    • the one without a thick on its LED in the GUI (is only possible to see, when a new run is started and the system hangs in ConfiguringN)
  • stop dimscript (by dimctrl --stop from a bash)
    • only possible when dimctrl was started as a server instance (dimctrl --server)
    • otherwise you have to kill dimctrl via ctrl + c
  • reset MCP (MCP/RESET)
  • Disconnect the problematic FAD (clicking on the corresponding LED in the GUI
  • wait 3 to 5 sec
  • Reconnect the problematic FAD (again clicking)

Not Helping

  • waiting
  • reset any crates (might just create another start up connection problem)
  • stopping or killing fadctrl
  • power cycling the camera (might just create another start up connection problem)

<A NAME="tsdrsunderflow"/> ### DRS underflow problem

Symptom

  • happens usually during start up
  • during first data taking one single patch in the events tab appears different (e.g. all dark blue)
  • after the 3 calibration runs are taken by the system, the calibration constants behave different:
    • the magenta line (normally around +1000mV) has pretty long error bars and is curved over the whole canvas

Solution

  • stop dimscript
    • dimctrl --stop from a bash
  • Reset crate (see <A HREF="#tscratereset">above</A>)
    • this doesn't helped often, but it doesn't need so much time as the power cycle
  • if the problem appears again: power cycle the camera (see <A HREF="#tshardware">below</A>)

Not Helping

  • all the rest

### startup - no proper connection problem

Symptom

  • happens usually during start up
  • several FADs are in state "Waiting" (orange in the GUI)

Solution

  • disenable the corresponding FADs by clicking in the GUI on them
  • wait a short time
  • enable the corresponding FADs by clicking again on them (wait 3 seconds between two clicks)

Not Helping

  • killing / quiting of fadctrl

### fadctrl hangs in state configuring by taking an external lp run

Symptom

Solution

Not Helping

  • all the rest

---

<A NAME="tsftus"/> FTUs / ftmctrl


### ftm in state ERROR:

Symptom

  • ftmctrl in state ERROR
  • one FTU is not green in the FTU tab of the GUI

Solution

  • stop dimscript
    • dimctrl --stop from a bash
  • switch off the trigger and try *Ping* (FTU-tab)
    • a FTU can erroneously be marked as *in error* after the GUI has been restartet, a *Ping* resolve that
  • if this doesn't help do a Crate Reset (see <A HREF="#tscratereset">above</A>)

Not Helping

  • stopping fadctrl or ftmctrl
  • Reset other or all Crates
  • quiting or killing any program
  • power cycling the camera

### ftmctrl in state ClockCondError:

Symptom

Solution

  • FTM_CONTROL/RESET_CONFIGURATION
  • or the MCP Reset button
  • make sure that the clock conditioner is locked before data taking

Not Helping

---

<A NAME="tsdrivectrl"/> DriveCtrl / Cosy


### Cosy in state Error

Symptom

  • drivectrl and Cosy in state ERROR
  • drivectrl and Cosy in state ARMED (when just the Tracking stopped accidently)

Solution

  • DRIVECTRL/RESUME

The RESUME command will proceed the following steps (so only RESUME is necessary to solve the Drive Error):

Not Helping

  • all other

### drivectrl in state 3

Symptom

  • drivectrl is in status 3 (LOCKED)

Solution

  • drivectrl goes in this status when the sun is rising, so if this "problem" occurs in the morning it is properly not a problem, it's only the way the telescope should behave, due to the rising sun
  • if this problem occurs during startup, you have to unlock the telescope:
    • DRIVECTRL/UNLOCK

Not Helping

  • DRIVECTRL/STOP
  • killing / quiting drivectrl or cosy

### Manual parking of the telescope

Symptom

  • drivectrl and Cosy don't accept / react to commands, for example an error with the IndraDrives occur

Solution

  • if it's not to late in the night, try to call an expert before you park manually
  • Park the telescope manually:
  1. Proceed all parts of the Shutdown procedure you can do in the current situation (see shutdown here)
    • be sure the bias voltage is OFF!
    • wear a helmet when going to the telescope
  2. there are two bars near the door, to move the telescope in the azimuth and zenith direction:
    • long bar: zenith
    • short bar: azimuth
  3. These bars, you can use for turning the telescope manually on the spots provided for this
    • azimuth: below the telecope
    • zenith: right of the mirrors
    • be aware that you don't turn in a way, that the cables get damaged
  4. Turn the telescope in the Parking Position (pointing north, towards the old container)

---

<A NAME="tscomputers"/> Computers


### Information about the Computers

You can find several informations about the Computer on La Palma on the computing Computing page

### Restarting a hanging PC

Symptom

  • the PC can't be reached per ssh, or something similar
  • be aware, that when all computers (expect for gate) seems to hang, its normally data which hangs, the other only try to mount the home from data, so they hang too

Solution

if it's not to late in the night, try to call an expert before you power cycle the computers

When you have to restart more than one PC, be sure you follow the Start-up procedure mentioned computing here

Power cycle the hanging computer from any other computer on the FACT internal network:

  • go in /usr/local/bin
  • execute one off the following scripts: aux_off, gui_off, gate_off, daq_off, data_off
  • wait a few minutes
  • execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON
  • Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively

or power cycle the hanging computer manually from the FACT-container

---

<A NAME="tsarduino"/> Arduino Reset


Find here a document with information how to reset an arduino. https://www.fact-project.org/logbook/showthread.php?tid=1500

---

<A NAME="tshardware"/> Hardware


### Power Cycling the camera

Symptom

  • A Crate Reset didn't solved a FTU or FAD problem

Solution

  • stop all scripts: dimctrl --stop from a bash
  • set bias to 0: - BIAS_CONTROL/SET_ZERO_VOLTAGE
  • Stop Trigger:
    • FTM_CONTROL/STOP_TRIGGER
    • or Stop Trigger in the left part of the GUI
  • FTM_CONTROL/DISCONNECT
  • FEEDBACK/STOP
  • if fadctrl is in state WritingData
    • FAD_CONTROL/CLOSE_OPEN_FILES
    • fadctrl now should be in state connected
  • FAD_CONTROL/STOP

Now power cycle the camera

  1. Switch off the agilent camera in the FACT Container
    • open the page 10.0.100.100 from a computer in the internal network of La Palma (can be your computer if you use vpn)
    • push 'Camera Off' to stop the camera (now only 'Bias Power' and '24VDC Interlock Power' should be 'yes')
  2. wait for about 15 min
  3. switch the agilent on
    • open the page 10.0.100.100 from a computer in the internal network of La Palma
    • push 'Camera On' to start the camera (now all 5 points should be 'yes')
  4. make sure that clock-conditioner is locked
  5. follow the preparation#startup Start-Up procedure to get the system running again

To check if the clock-conditioner is locked, you may check in the gui in the tab

  • 'Trigger' the led next to the pulldown 'DRS sampling frequency'
  • 'FAD' with mouse over on the LED next to 'Reference Clock' whether all 40 values are roughly 1/1024

(some more information in a post by Patrick: https://www.fact-project.org/logbook/showthread.php?tid=1102&pid=6070#pid6070 )

### Bias In case you need to switch off the bias crate, please do first BIAS_CONTROL/DISCONNECT

Note: See TracWiki for help on using the wiki.