Context Navigation

Changes between Version 1 and Version 2 of TroubleShooting

Timestamp:: 08/08/13 08:53:08 (12 years ago)
Author:: ftemme
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

TroubleShooting

-              v1
+              v2
 TroubleShootingDrivectrl
+- cosy in state ERROR
+- drivectrl in state LOCKED
+- Manual parking of the telescope
 TroubleShootingComputers
+- Informations about the Computers
+- Restarting a hanging PC
 TroubleShootingArduino
+- Arduino Reset
 TroubleShootingHardware
+- Power Cycling the camera
+- Switching off the bias crate
 == General Remarks ==
 …
  - restarting a program is in most cases not a solution and only increase the risk to trigger more problems. So avoid restarting programs as long as possible
-### bias disconnection
-__Symptom__
-- biasctrl is in state DISCONNECTED
-__Solution__
-- Do  _RECONNECT_
-- Send a command like _REQUEST_STATUS_
-- Sometime the bias crate will disconnect again, do _RECONNECT_
-__Not Helping__
-- do not close or kill biasctrl
-### OverCurrentStatus
-__Symptom__
-- when biasctrl ramped the voltage, it get in the state OVERCURRENT
-__Solution__
-- First try biasctrl/RESET_OVER_CURRENT_STATUS (maybe a few times)
-- if it don't help try to ramp the voltage down
- - biasctrl/SET_GLOBAL_DAC 0
-__Not Helping__
-- do not close or kill biasctrl
-### Status notReferenced
-__Symptom__
-- biasctrl is in state notReferenced
-__Solution__
-- start the Ramping again
- - biasctrl/START
-__Not Helping__
-- do not close or kill biasctrl
----
-<A NAME="tsfads"/>
-FADs
-----
-### start up - connection problem
-__Symptom__
-- problem occurs usually during start up
-- after FAD_CONTROL/START or pushing FAD -> START button in the GUI, not all 40 FAD LEDs are green
-- fadctrl is in state Connecting (instead of Connected)
-__Solution__
-- stop dimscripts
- - dimctrl --stop from a bash
-- do a __crate reset__:
-<A NAME="tscratereset"/>
-There exist a script which will do the __Crate Reset__ automatically:
-- _.x ScriptsForDimCtrl/ResetCrate.dim C=n_
-- n = number of crate you want to reset
-Crate Reset manually:
-- disable all FTUs (in the FTU tab of the GUI)
-- disconnect the 10 FADs in the crate, by clicking the LEDs
-- ftmctrl/RESET_CRATE x (x corresponding crate)
-- enable all FTUs (in the FTU tab of the GUI)
-- reconnect the 10 FADs, one by one with 3 seconds waiting in between (the FAD needs this time to boot)
-__Not Helping__
-- disconnect / reconnect to the FAD.
-- waiting
-- reset other / all crates (might just create another of these bugs in another FAD)
-- stopping or killing fadctrl
-- power cycling the camera (might just create another of these bugs in another FAD)
-<A NAME="tsfadloss"/>
-### In-run-FAD-loss
-__Symptom__
-- during a run:
- - 4 adjacent *strange* patches in the events tab of the GUI
- - orange warnings in fadctrl-console (eventbuilder realises, that one FAD stopped sending data)
- - trigger rate drop to 0
-- when the run ended and a new one is started
- - MCP hang in state Configuring3
- - fadctrl hang in state Configuring2
-__Solution__
-- find out which FAD board is disconnected
- - the one board in the FAD tab which has the strange behaviour
- - the one without a thick on its LED in the GUI (is only possible to see, when a new run is started and the system hangs in ConfiguringN)
-- stop dimscript (by dimctrl --stop from a bash)
- - only possible when dimctrl was started as a server instance (dimctrl --server)
- - otherwise you have to kill dimctrl via ctrl + c
-- reset MCP (MCP/RESET)
-- Disconnect the problematic FAD (clicking on the corresponding LED in the GUI
-- wait 3 to 5 sec
-- Reconnect the problematic FAD (again clicking)
-__Not Helping__
-- waiting
-- reset any crates (might just create another start up connection problem)
-- stopping or killing fadctrl
-- power cycling the camera (might just create another start up connection problem)
-<A NAME="tsdrsunderflow"/>
-### DRS underflow problem
-__Symptom__
-- happens usually during start up
-- during first data taking one single patch in the events tab appears different (e.g. all dark blue)
-- after the 3 calibration runs are taken by the system, the calibration constants behave different:
- - the magenta line (normally around +1000mV) has pretty long error bars and is curved over the whole canvas
-__Solution__
-- stop dimscript
- - dimctrl --stop from a bash
-- Reset crate (see <A HREF="#tscratereset">above</A>)
- - this doesn't helped often, but it doesn't need so much time as the power cycle
-- if the problem appears again: __power cycle__ the camera (see <A HREF="#tshardware">below</A>)
-__Not Helping__
-- all the rest
-### startup - no proper connection problem
-__Symptom__
-- happens usually during start up
-- several FADs are in state "Waiting" (orange in the GUI)
-__Solution__
-- disenable the corresponding FADs by clicking in the GUI on them
-- wait a short time
-- enable the corresponding FADs by clicking again on them (wait 3 seconds between two clicks)
-__Not Helping__
-- killing / quiting of fadctrl
-### fadctrl hangs in state configuring by taking an external lp run
-__Symptom__
-- fadctrl hangs in state configuring when it starts the external lp run during datataking (point 17. under [[datatakingdetails#normaldata normal datataking procedure]] the 9. run)
-- all FADs are connected (if not all FADs are connected it is a FAD connection loss (see above)
-__Solution__
-- normally the shifter just forgot to open the lid, so
- - stop script
- - open the lid (see [[preparation#startup Start Up procedure]] , point 9)
-__Not Helping__
-- all the rest
----
-<A NAME="tsftus"/>
-FTUs / ftmctrl
-----
-### ftm in state ERROR:
-__Symptom__
-- ftmctrl in state ERROR
-- one FTU is not green in the FTU tab of the GUI
-__Solution__
-- stop dimscript
- - dimctrl --stop from a bash
-- switch off the trigger and try *Ping* (FTU-tab)
- - a FTU can erroneously be marked as *in error* after the GUI has been restartet, a *Ping* resolve that
-- if this doesn't help do a Crate Reset (see <A HREF="#tscratereset">above</A>)
-__Not Helping__
-- stopping fadctrl or ftmctrl
-- Reset other or all Crates
-- quiting or killing any program
-- power cycling the camera
-### ftmctrl in state ClockCondError:
-__Symptom__
-- ftmctrl in state ClockCondError
-- clock conditioner is not locked
-__Solution__
-- FTM_CONTROL/RESET_CONFIGURATION
-- or the MCP Reset button
-- make sure that the clock conditioner is locked before data taking
-__Not Helping__
----
-<A NAME="tsdrivectrl"/>
-DriveCtrl / Cosy
-----
-### Cosy in state Error
-__Symptom__
-- drivectrl and Cosy in state ERROR
-- drivectrl and Cosy in state ARMED (when just the Tracking stopped accidently)
-__Solution__
-- DRIVECTRL/RESUME
-The RESUME command will proceed the following steps (so only RESUME is necessary to solve the Drive Error):
-- DRIVECTRL/STOP (now drivectrl shut be in state armed)
-- DRIVECTRL/TRACK_SOURCE x y sourcename
- - last Tracking command: see output of dimctrl
- - see [[datatakingdetails#pointingpositions Current Pointing Positions]]
-__Not Helping__
-- all other
-### drivectrl in state 3
-__Symptom__
-- drivectrl is in status 3 (LOCKED)
-__Solution__
-- drivectrl goes in this status when the sun is rising, so if this "problem" occurs in the morning it is properly not a problem, it's only the way the telescope should behave, due to the rising sun
-- if this problem occurs during startup, you have to unlock the telescope:
- - DRIVECTRL/UNLOCK
-__Not Helping__
-- DRIVECTRL/STOP
-- killing / quiting drivectrl or cosy
-### Manual parking of the telescope
-__Symptom__
-- drivectrl and Cosy don't accept / react to commands, for example an error with the IndraDrives occur
-__Solution__
-- if it's not to late in the night, try to call an expert before you park manually
-- Park the telescope manually:
-. Proceed all parts of the Shutdown procedure you can do in the current situation (see [[shutdown here]])
- - be sure the bias voltage is OFF!
- - wear a helmet when going to the telescope
-. there are two bars near the door, to move the telescope in the azimuth and zenith direction:
- - long bar: zenith
- - short bar: azimuth
-. These bars, you can use for turning the telescope manually on the spots provided for this
- - azimuth: below the telecope
- - zenith: right of the mirrors
- - be aware that you don't turn in a way, that the cables get damaged
-. Turn the telescope in the Parking Position (pointing north, towards the old container)
----
-<A NAME="tscomputers"/>
-Computers
-----
-### Information about the Computers
-You can find several informations about the Computer on La Palma on the [[computing Computing page]]
-### Restarting a hanging PC
-__Symptom__
-- the PC can't be reached per ssh, or something similar
-- be aware, that when all computers (expect for gate) seems to hang, its normally data which hangs, the other only try to mount the home from data, so they hang too
-__Solution__
-if it's not to late in the night, try to call an expert before you power cycle the computers
-When you have to restart more than one PC, be sure you follow the Start-up procedure mentioned [[computing here]]
-Power cycle the hanging computer from any other computer on the FACT internal network:
-- go in /usr/local/bin
-- execute one off the following scripts: aux_off, gui_off, gate_off,  daq_off, data_off
-- wait a few minutes
-- execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON
-- Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively
-or power cycle the hanging computer manually from the FACT-container
----
-<A NAME="tsarduino"/>
-Arduino Reset
-----
-Find here a document with information how to reset an arduino.
-https://www.fact-project.org/logbook/showthread.php?tid=1500
----
-<A NAME="tshardware"/>
-Hardware
-----
-### Power Cycling the camera
-__Symptom__
-- A Crate Reset didn't solved a FTU or FAD problem
-__Solution__
-- stop all scripts: dimctrl --stop from a bash
-- set bias to 0:  - BIAS_CONTROL/SET_ZERO_VOLTAGE
-- Stop Trigger:
- - FTM_CONTROL/STOP_TRIGGER
- - or Stop Trigger in the left part of the GUI
-- FTM_CONTROL/DISCONNECT
-- FEEDBACK/STOP
-- if fadctrl is in state WritingData
- - FAD_CONTROL/CLOSE_OPEN_FILES
- - fadctrl now should be in state connected
-- FAD_CONTROL/STOP
-Now __power cycle__ the camera
-. Switch off the agilent camera in the FACT Container
- - open the page 10.0.100.100 from a computer in the internal network of La Palma (can be your computer if you use vpn)
- - push 'Camera Off' to stop the camera (now only 'Bias Power' and '24VDC Interlock Power' should be 'yes')
-. wait for about 15 min
-. switch the agilent on
- - open the page 10.0.100.100 from a computer in the internal network of La Palma
- - push 'Camera On' to start the camera (now all 5 points should be 'yes')
-. make sure that clock-conditioner is locked
-. follow the [[preparation#startup Start-Up procedure]] to get the system running again
-To check if the __clock-conditioner__ is locked, you may check in the gui in the tab
-- 'Trigger' the led next to the pulldown 'DRS sampling frequency'
-- 'FAD' with mouse over on the LED next to 'Reference Clock' whether all 40 values are roughly 1/1024
-(some more information in a post by Patrick: https://www.fact-project.org/logbook/showthread.php?tid=1102&pid=6070#pid6070 )
-### Bias
-In case you need to switch off the bias crate, please do first BIAS_CONTROL/DISCONNECT