= Trouble shooting = TroubleShootingAutomaticFailureHandling - AutoResume - FadConnectionLoss TroubleShootingBias - bias disconnection - Overcurrent status - Status notReferenced TroubleShootingFads - crate reset - start up connection problem - in-run fad loss - drs underflow problem - startup - no proper connection problem - fadctrl hangs in state configuring by taking an external lp run TroubleShootingFtus - ftmctrl in state ERROR - ftmctrl in state ClockCondError TroubleShootingDrivectrl TroubleShootingComputers TroubleShootingArduino TroubleShootingHardware == General Remarks == === Stopping Programs === - **never** stop a not hanging program with //ctrl+c// - when a restart of the program is really necessary use .q instead - restarting a program is in most cases not a solution and only increase the risk to trigger more problems. So avoid restarting programs as long as possible ### bias disconnection __Symptom__ - biasctrl is in state DISCONNECTED __Solution__ - Do _RECONNECT_ - Send a command like _REQUEST_STATUS_ - Sometime the bias crate will disconnect again, do _RECONNECT_ __Not Helping__ - do not close or kill biasctrl ### OverCurrentStatus __Symptom__ - when biasctrl ramped the voltage, it get in the state OVERCURRENT __Solution__ - First try biasctrl/RESET_OVER_CURRENT_STATUS (maybe a few times) - if it don't help try to ramp the voltage down - biasctrl/SET_GLOBAL_DAC 0 __Not Helping__ - do not close or kill biasctrl ### Status notReferenced __Symptom__ - biasctrl is in state notReferenced __Solution__ - start the Ramping again - biasctrl/START __Not Helping__ - do not close or kill biasctrl --- FADs ---- ### start up - connection problem __Symptom__ - problem occurs usually during start up - after FAD_CONTROL/START or pushing FAD -> START button in the GUI, not all 40 FAD LEDs are green - fadctrl is in state Connecting (instead of Connected) __Solution__ - stop dimscripts - dimctrl --stop from a bash - do a __crate reset__: There exist a script which will do the __Crate Reset__ automatically: - _.x ScriptsForDimCtrl/ResetCrate.dim C=n_ - n = number of crate you want to reset Crate Reset manually: - disable all FTUs (in the FTU tab of the GUI) - disconnect the 10 FADs in the crate, by clicking the LEDs - ftmctrl/RESET_CRATE x (x corresponding crate) - enable all FTUs (in the FTU tab of the GUI) - reconnect the 10 FADs, one by one with 3 seconds waiting in between (the FAD needs this time to boot) __Not Helping__ - disconnect / reconnect to the FAD. - waiting - reset other / all crates (might just create another of these bugs in another FAD) - stopping or killing fadctrl - power cycling the camera (might just create another of these bugs in another FAD) ### In-run-FAD-loss __Symptom__ - during a run: - 4 adjacent *strange* patches in the events tab of the GUI - orange warnings in fadctrl-console (eventbuilder realises, that one FAD stopped sending data) - trigger rate drop to 0 - when the run ended and a new one is started - MCP hang in state Configuring3 - fadctrl hang in state Configuring2 __Solution__ - find out which FAD board is disconnected - the one board in the FAD tab which has the strange behaviour - the one without a thick on its LED in the GUI (is only possible to see, when a new run is started and the system hangs in ConfiguringN) - stop dimscript (by dimctrl --stop from a bash) - only possible when dimctrl was started as a server instance (dimctrl --server) - otherwise you have to kill dimctrl via ctrl + c - reset MCP (MCP/RESET) - Disconnect the problematic FAD (clicking on the corresponding LED in the GUI - wait 3 to 5 sec - Reconnect the problematic FAD (again clicking) __Not Helping__ - waiting - reset any crates (might just create another start up connection problem) - stopping or killing fadctrl - power cycling the camera (might just create another start up connection problem) ### DRS underflow problem __Symptom__ - happens usually during start up - during first data taking one single patch in the events tab appears different (e.g. all dark blue) - after the 3 calibration runs are taken by the system, the calibration constants behave different: - the magenta line (normally around +1000mV) has pretty long error bars and is curved over the whole canvas __Solution__ - stop dimscript - dimctrl --stop from a bash - Reset crate (see above) - this doesn't helped often, but it doesn't need so much time as the power cycle - if the problem appears again: __power cycle__ the camera (see below) __Not Helping__ - all the rest ### startup - no proper connection problem __Symptom__ - happens usually during start up - several FADs are in state "Waiting" (orange in the GUI) __Solution__ - disenable the corresponding FADs by clicking in the GUI on them - wait a short time - enable the corresponding FADs by clicking again on them (wait 3 seconds between two clicks) __Not Helping__ - killing / quiting of fadctrl ### fadctrl hangs in state configuring by taking an external lp run __Symptom__ - fadctrl hangs in state configuring when it starts the external lp run during datataking (point 17. under [[datatakingdetails#normaldata normal datataking procedure]] the 9. run) - all FADs are connected (if not all FADs are connected it is a FAD connection loss (see above) __Solution__ - normally the shifter just forgot to open the lid, so - stop script - open the lid (see [[preparation#startup Start Up procedure]] , point 9) __Not Helping__ - all the rest --- FTUs / ftmctrl ---- ### ftm in state ERROR: __Symptom__ - ftmctrl in state ERROR - one FTU is not green in the FTU tab of the GUI __Solution__ - stop dimscript - dimctrl --stop from a bash - switch off the trigger and try *Ping* (FTU-tab) - a FTU can erroneously be marked as *in error* after the GUI has been restartet, a *Ping* resolve that - if this doesn't help do a Crate Reset (see above) __Not Helping__ - stopping fadctrl or ftmctrl - Reset other or all Crates - quiting or killing any program - power cycling the camera ### ftmctrl in state ClockCondError: __Symptom__ - ftmctrl in state ClockCondError - clock conditioner is not locked __Solution__ - FTM_CONTROL/RESET_CONFIGURATION - or the MCP Reset button - make sure that the clock conditioner is locked before data taking __Not Helping__ --- DriveCtrl / Cosy ---- ### Cosy in state Error __Symptom__ - drivectrl and Cosy in state ERROR - drivectrl and Cosy in state ARMED (when just the Tracking stopped accidently) __Solution__ - DRIVECTRL/RESUME The RESUME command will proceed the following steps (so only RESUME is necessary to solve the Drive Error): - DRIVECTRL/STOP (now drivectrl shut be in state armed) - DRIVECTRL/TRACK_SOURCE x y sourcename - last Tracking command: see output of dimctrl - see [[datatakingdetails#pointingpositions Current Pointing Positions]] __Not Helping__ - all other ### drivectrl in state 3 __Symptom__ - drivectrl is in status 3 (LOCKED) __Solution__ - drivectrl goes in this status when the sun is rising, so if this "problem" occurs in the morning it is properly not a problem, it's only the way the telescope should behave, due to the rising sun - if this problem occurs during startup, you have to unlock the telescope: - DRIVECTRL/UNLOCK __Not Helping__ - DRIVECTRL/STOP - killing / quiting drivectrl or cosy ### Manual parking of the telescope __Symptom__ - drivectrl and Cosy don't accept / react to commands, for example an error with the IndraDrives occur __Solution__ - if it's not to late in the night, try to call an expert before you park manually - Park the telescope manually: 0. Proceed all parts of the Shutdown procedure you can do in the current situation (see [[shutdown here]]) - be sure the bias voltage is OFF! - wear a helmet when going to the telescope 0. there are two bars near the door, to move the telescope in the azimuth and zenith direction: - long bar: zenith - short bar: azimuth 0. These bars, you can use for turning the telescope manually on the spots provided for this - azimuth: below the telecope - zenith: right of the mirrors - be aware that you don't turn in a way, that the cables get damaged 0. Turn the telescope in the Parking Position (pointing north, towards the old container) --- Computers ---- ### Information about the Computers You can find several informations about the Computer on La Palma on the [[computing Computing page]] ### Restarting a hanging PC __Symptom__ - the PC can't be reached per ssh, or something similar - be aware, that when all computers (expect for gate) seems to hang, its normally data which hangs, the other only try to mount the home from data, so they hang too __Solution__ if it's not to late in the night, try to call an expert before you power cycle the computers When you have to restart more than one PC, be sure you follow the Start-up procedure mentioned [[computing here]] Power cycle the hanging computer from any other computer on the FACT internal network: - go in /usr/local/bin - execute one off the following scripts: aux_off, gui_off, gate_off, daq_off, data_off - wait a few minutes - execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON - Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively or power cycle the hanging computer manually from the FACT-container --- Arduino Reset ---- Find here a document with information how to reset an arduino. https://www.fact-project.org/logbook/showthread.php?tid=1500 --- Hardware ---- ### Power Cycling the camera __Symptom__ - A Crate Reset didn't solved a FTU or FAD problem __Solution__ - stop all scripts: dimctrl --stop from a bash - set bias to 0: - BIAS_CONTROL/SET_ZERO_VOLTAGE - Stop Trigger: - FTM_CONTROL/STOP_TRIGGER - or Stop Trigger in the left part of the GUI - FTM_CONTROL/DISCONNECT - FEEDBACK/STOP - if fadctrl is in state WritingData - FAD_CONTROL/CLOSE_OPEN_FILES - fadctrl now should be in state connected - FAD_CONTROL/STOP Now __power cycle__ the camera 0. Switch off the agilent camera in the FACT Container - open the page 10.0.100.100 from a computer in the internal network of La Palma (can be your computer if you use vpn) - push 'Camera Off' to stop the camera (now only 'Bias Power' and '24VDC Interlock Power' should be 'yes') 0. wait for about 15 min 0. switch the agilent on - open the page 10.0.100.100 from a computer in the internal network of La Palma - push 'Camera On' to start the camera (now all 5 points should be 'yes') 0. make sure that clock-conditioner is locked 0. follow the [[preparation#startup Start-Up procedure]] to get the system running again To check if the __clock-conditioner__ is locked, you may check in the gui in the tab - 'Trigger' the led next to the pulldown 'DRS sampling frequency' - 'FAD' with mouse over on the LED next to 'Reference Clock' whether all 40 values are roughly 1/1024 (some more information in a post by Patrick: https://www.fact-project.org/logbook/showthread.php?tid=1102&pid=6070#pid6070 ) ### Bias In case you need to switch off the bias crate, please do first BIAS_CONTROL/DISCONNECT