Version 1 (modified by 11 years ago) ( diff ) | ,
---|
Trouble shooting
TroubleShootingAutomaticFailureHandling
- bias disconnection
- Overcurrent status
- Status notReferenced
- crate reset
- start up connection problem
- in-run fad loss
- drs underflow problem
- startup - no proper connection problem
- fadctrl hangs in state configuring by taking an external lp run
- ftmctrl in state ERROR
- ftmctrl in state ClockCondError
General Remarks
Stopping Programs
- never stop a not hanging program with ctrl+c
- when a restart of the program is really necessary use .q instead
- restarting a program is in most cases not a solution and only increase the risk to trigger more problems. So avoid restarting programs as long as possible
### bias disconnection
Symptom
- biasctrl is in state DISCONNECTED
Solution
- Do _RECONNECT_
- Send a command like _REQUEST_STATUS_
- Sometime the bias crate will disconnect again, do _RECONNECT_
Not Helping
- do not close or kill biasctrl
Symptom
- when biasctrl ramped the voltage, it get in the state OVERCURRENT
Solution
- First try biasctrl/RESET_OVER_CURRENT_STATUS (maybe a few times)
- if it don't help try to ramp the voltage down
- biasctrl/SET_GLOBAL_DAC 0
Not Helping
- do not close or kill biasctrl
### Status notReferenced
Symptom
- biasctrl is in state notReferenced
Solution
- start the Ramping again
- biasctrl/START
Not Helping
- do not close or kill biasctrl
---
<A NAME="tsfads"/> FADs
### start up - connection problem
Symptom
- problem occurs usually during start up
- after FAD_CONTROL/START or pushing FAD -> START button in the GUI, not all 40 FAD LEDs are green
- fadctrl is in state Connecting (instead of Connected)
Solution
- stop dimscripts
- dimctrl --stop from a bash
- do a crate reset:
<A NAME="tscratereset"/>
There exist a script which will do the Crate Reset automatically:
- _.x ScriptsForDimCtrl/ResetCrate.dim C=n_
- n = number of crate you want to reset
Crate Reset manually:
- disable all FTUs (in the FTU tab of the GUI)
- disconnect the 10 FADs in the crate, by clicking the LEDs
- ftmctrl/RESET_CRATE x (x corresponding crate)
- enable all FTUs (in the FTU tab of the GUI)
- reconnect the 10 FADs, one by one with 3 seconds waiting in between (the FAD needs this time to boot)
Not Helping
- disconnect / reconnect to the FAD.
- waiting
- reset other / all crates (might just create another of these bugs in another FAD)
- stopping or killing fadctrl
- power cycling the camera (might just create another of these bugs in another FAD)
<A NAME="tsfadloss"/> ### In-run-FAD-loss
Symptom
- during a run:
- 4 adjacent *strange* patches in the events tab of the GUI
- orange warnings in fadctrl-console (eventbuilder realises, that one FAD stopped sending data)
- trigger rate drop to 0
- when the run ended and a new one is started
- MCP hang in state Configuring3
- fadctrl hang in state Configuring2
Solution
- find out which FAD board is disconnected
- the one board in the FAD tab which has the strange behaviour
- the one without a thick on its LED in the GUI (is only possible to see, when a new run is started and the system hangs in ConfiguringN)
- stop dimscript (by dimctrl --stop from a bash)
- only possible when dimctrl was started as a server instance (dimctrl --server)
- otherwise you have to kill dimctrl via ctrl + c
- reset MCP (MCP/RESET)
- Disconnect the problematic FAD (clicking on the corresponding LED in the GUI
- wait 3 to 5 sec
- Reconnect the problematic FAD (again clicking)
Not Helping
- waiting
- reset any crates (might just create another start up connection problem)
- stopping or killing fadctrl
- power cycling the camera (might just create another start up connection problem)
<A NAME="tsdrsunderflow"/> ### DRS underflow problem
Symptom
- happens usually during start up
- during first data taking one single patch in the events tab appears different (e.g. all dark blue)
- after the 3 calibration runs are taken by the system, the calibration constants behave different:
- the magenta line (normally around +1000mV) has pretty long error bars and is curved over the whole canvas
Solution
- stop dimscript
- dimctrl --stop from a bash
- Reset crate (see <A HREF="#tscratereset">above</A>)
- this doesn't helped often, but it doesn't need so much time as the power cycle
- if the problem appears again: power cycle the camera (see <A HREF="#tshardware">below</A>)
Not Helping
- all the rest
### startup - no proper connection problem
Symptom
- happens usually during start up
- several FADs are in state "Waiting" (orange in the GUI)
Solution
- disenable the corresponding FADs by clicking in the GUI on them
- wait a short time
- enable the corresponding FADs by clicking again on them (wait 3 seconds between two clicks)
Not Helping
- killing / quiting of fadctrl
### fadctrl hangs in state configuring by taking an external lp run
Symptom
- fadctrl hangs in state configuring when it starts the external lp run during datataking (point 17. under datatakingdetails#normaldata normal datataking procedure the 9. run)
- all FADs are connected (if not all FADs are connected it is a FAD connection loss (see above)
Solution
- normally the shifter just forgot to open the lid, so
- stop script
- open the lid (see preparation#startup Start Up procedure , point 9)
Not Helping
- all the rest
---
<A NAME="tsftus"/> FTUs / ftmctrl
### ftm in state ERROR:
Symptom
- ftmctrl in state ERROR
- one FTU is not green in the FTU tab of the GUI
Solution
- stop dimscript
- dimctrl --stop from a bash
- switch off the trigger and try *Ping* (FTU-tab)
- a FTU can erroneously be marked as *in error* after the GUI has been restartet, a *Ping* resolve that
- if this doesn't help do a Crate Reset (see <A HREF="#tscratereset">above</A>)
Not Helping
- stopping fadctrl or ftmctrl
- Reset other or all Crates
- quiting or killing any program
- power cycling the camera
### ftmctrl in state ClockCondError:
Symptom
- ftmctrl in state ClockCondError
- clock conditioner is not locked
Solution
- FTM_CONTROL/RESET_CONFIGURATION
- or the MCP Reset button
- make sure that the clock conditioner is locked before data taking
Not Helping
---
<A NAME="tsdrivectrl"/> DriveCtrl / Cosy
### Cosy in state Error
Symptom
- drivectrl and Cosy in state ERROR
- drivectrl and Cosy in state ARMED (when just the Tracking stopped accidently)
Solution
- DRIVECTRL/RESUME
The RESUME command will proceed the following steps (so only RESUME is necessary to solve the Drive Error):
- DRIVECTRL/STOP (now drivectrl shut be in state armed)
- DRIVECTRL/TRACK_SOURCE x y sourcename
- last Tracking command: see output of dimctrl
- see datatakingdetails#pointingpositions Current Pointing Positions
Not Helping
- all other
### drivectrl in state 3
Symptom
- drivectrl is in status 3 (LOCKED)
Solution
- drivectrl goes in this status when the sun is rising, so if this "problem" occurs in the morning it is properly not a problem, it's only the way the telescope should behave, due to the rising sun
- if this problem occurs during startup, you have to unlock the telescope:
- DRIVECTRL/UNLOCK
Not Helping
- DRIVECTRL/STOP
- killing / quiting drivectrl or cosy
### Manual parking of the telescope
Symptom
- drivectrl and Cosy don't accept / react to commands, for example an error with the IndraDrives occur
Solution
- if it's not to late in the night, try to call an expert before you park manually
- Park the telescope manually:
- Proceed all parts of the Shutdown procedure you can do in the current situation (see shutdown here)
- be sure the bias voltage is OFF!
- wear a helmet when going to the telescope
- there are two bars near the door, to move the telescope in the azimuth and zenith direction:
- long bar: zenith
- short bar: azimuth
- These bars, you can use for turning the telescope manually on the spots provided for this
- azimuth: below the telecope
- zenith: right of the mirrors
- be aware that you don't turn in a way, that the cables get damaged
- Turn the telescope in the Parking Position (pointing north, towards the old container)
---
<A NAME="tscomputers"/> Computers
### Information about the Computers
You can find several informations about the Computer on La Palma on the computing Computing page
### Restarting a hanging PC
Symptom
- the PC can't be reached per ssh, or something similar
- be aware, that when all computers (expect for gate) seems to hang, its normally data which hangs, the other only try to mount the home from data, so they hang too
Solution
if it's not to late in the night, try to call an expert before you power cycle the computers
When you have to restart more than one PC, be sure you follow the Start-up procedure mentioned computing here
Power cycle the hanging computer from any other computer on the FACT internal network:
- go in /usr/local/bin
- execute one off the following scripts: aux_off, gui_off, gate_off, daq_off, data_off
- wait a few minutes
- execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON
- Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively
or power cycle the hanging computer manually from the FACT-container
---
<A NAME="tsarduino"/> Arduino Reset
Find here a document with information how to reset an arduino. https://www.fact-project.org/logbook/showthread.php?tid=1500
---
<A NAME="tshardware"/> Hardware
### Power Cycling the camera
Symptom
- A Crate Reset didn't solved a FTU or FAD problem
Solution
- stop all scripts: dimctrl --stop from a bash
- set bias to 0: - BIAS_CONTROL/SET_ZERO_VOLTAGE
- Stop Trigger:
- FTM_CONTROL/STOP_TRIGGER
- or Stop Trigger in the left part of the GUI
- FTM_CONTROL/DISCONNECT
- FEEDBACK/STOP
- if fadctrl is in state WritingData
- FAD_CONTROL/CLOSE_OPEN_FILES
- fadctrl now should be in state connected
- FAD_CONTROL/STOP
Now power cycle the camera
- Switch off the agilent camera in the FACT Container
- open the page 10.0.100.100 from a computer in the internal network of La Palma (can be your computer if you use vpn)
- push 'Camera Off' to stop the camera (now only 'Bias Power' and '24VDC Interlock Power' should be 'yes')
- wait for about 15 min
- switch the agilent on
- open the page 10.0.100.100 from a computer in the internal network of La Palma
- push 'Camera On' to start the camera (now all 5 points should be 'yes')
- make sure that clock-conditioner is locked
- follow the preparation#startup Start-Up procedure to get the system running again
To check if the clock-conditioner is locked, you may check in the gui in the tab
- 'Trigger' the led next to the pulldown 'DRS sampling frequency'
- 'FAD' with mouse over on the LED next to 'Reference Clock' whether all 40 values are roughly 1/1024
(some more information in a post by Patrick: https://www.fact-project.org/logbook/showthread.php?tid=1102&pid=6070#pid6070 )
### Bias In case you need to switch off the bias crate, please do first BIAS_CONTROL/DISCONNECT