Changes between Initial Version and Version 1 of TroubleShooting


Ignore:
Timestamp:
08/08/13 08:49:40 (12 years ago)
Author:
ftemme
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • TroubleShooting

    v1 v1  
     1= Trouble shooting =
     2
     3TroubleShootingAutomaticFailureHandling
     4
     5- AutoResume
     6- FadConnectionLoss
     7
     8TroubleShootingBias
     9
     10- bias disconnection
     11- Overcurrent status
     12- Status notReferenced
     13
     14TroubleShootingFads
     15
     16- crate reset
     17- start up connection problem
     18- in-run fad loss
     19- drs underflow problem
     20- startup - no proper connection problem
     21- fadctrl hangs in state configuring by taking an external lp run
     22
     23TroubleShootingFtus
     24
     25- ftmctrl in state ERROR
     26- ftmctrl in state ClockCondError
     27
     28TroubleShootingDrivectrl
     29
     30TroubleShootingComputers
     31
     32TroubleShootingArduino
     33
     34TroubleShootingHardware
     35
     36== General Remarks ==
     37
     38=== Stopping Programs ===
     39
     40- **never** stop a not hanging program with //ctrl+c//
     41 - when a restart of the program is really necessary use .q instead
     42 - restarting a program is in most cases not a solution and only increase the risk to trigger more problems. So avoid restarting programs as long as possible
     43
     44### bias disconnection
     45
     46__Symptom__
     47
     48- biasctrl is in state DISCONNECTED
     49
     50__Solution__
     51
     52- Do  _RECONNECT_
     53- Send a command like _REQUEST_STATUS_
     54- Sometime the bias crate will disconnect again, do _RECONNECT_
     55
     56__Not Helping__
     57
     58- do not close or kill biasctrl
     59
     60### OverCurrentStatus
     61
     62__Symptom__
     63
     64- when biasctrl ramped the voltage, it get in the state OVERCURRENT
     65
     66__Solution__
     67
     68- First try biasctrl/RESET_OVER_CURRENT_STATUS (maybe a few times)
     69- if it don't help try to ramp the voltage down
     70 - biasctrl/SET_GLOBAL_DAC 0
     71
     72__Not Helping__
     73
     74- do not close or kill biasctrl
     75
     76### Status notReferenced
     77
     78__Symptom__
     79
     80- biasctrl is in state notReferenced
     81
     82__Solution__
     83
     84- start the Ramping again
     85 - biasctrl/START
     86
     87__Not Helping__
     88
     89- do not close or kill biasctrl
     90
     91---
     92
     93<A NAME="tsfads"/>
     94FADs
     95----
     96
     97### start up - connection problem
     98
     99__Symptom__
     100
     101- problem occurs usually during start up
     102- after FAD_CONTROL/START or pushing FAD -> START button in the GUI, not all 40 FAD LEDs are green
     103- fadctrl is in state Connecting (instead of Connected)
     104
     105__Solution__
     106
     107- stop dimscripts
     108 - dimctrl --stop from a bash
     109- do a __crate reset__:
     110
     111<A NAME="tscratereset"/>
     112
     113There exist a script which will do the __Crate Reset__ automatically:
     114
     115- _.x ScriptsForDimCtrl/ResetCrate.dim C=n_
     116- n = number of crate you want to reset
     117
     118Crate Reset manually:
     119
     120- disable all FTUs (in the FTU tab of the GUI)
     121- disconnect the 10 FADs in the crate, by clicking the LEDs
     122- ftmctrl/RESET_CRATE x (x corresponding crate)
     123- enable all FTUs (in the FTU tab of the GUI)
     124- reconnect the 10 FADs, one by one with 3 seconds waiting in between (the FAD needs this time to boot)
     125
     126__Not Helping__
     127
     128- disconnect / reconnect to the FAD.
     129- waiting
     130- reset other / all crates (might just create another of these bugs in another FAD)
     131- stopping or killing fadctrl
     132- power cycling the camera (might just create another of these bugs in another FAD)
     133
     134<A NAME="tsfadloss"/>
     135### In-run-FAD-loss
     136
     137__Symptom__
     138
     139- during a run:
     140 - 4 adjacent *strange* patches in the events tab of the GUI
     141 - orange warnings in fadctrl-console (eventbuilder realises, that one FAD stopped sending data)
     142 - trigger rate drop to 0
     143- when the run ended and a new one is started
     144 - MCP hang in state Configuring3
     145 - fadctrl hang in state Configuring2
     146
     147__Solution__
     148
     149- find out which FAD board is disconnected
     150 - the one board in the FAD tab which has the strange behaviour
     151 - the one without a thick on its LED in the GUI (is only possible to see, when a new run is started and the system hangs in ConfiguringN)
     152- stop dimscript (by dimctrl --stop from a bash)
     153 - only possible when dimctrl was started as a server instance (dimctrl --server)
     154 - otherwise you have to kill dimctrl via ctrl + c
     155- reset MCP (MCP/RESET)
     156- Disconnect the problematic FAD (clicking on the corresponding LED in the GUI
     157- wait 3 to 5 sec
     158- Reconnect the problematic FAD (again clicking)
     159
     160__Not Helping__
     161
     162- waiting
     163- reset any crates (might just create another start up connection problem)
     164- stopping or killing fadctrl
     165- power cycling the camera (might just create another start up connection problem)
     166
     167<A NAME="tsdrsunderflow"/>
     168### DRS underflow problem
     169
     170__Symptom__
     171
     172- happens usually during start up
     173- during first data taking one single patch in the events tab appears different (e.g. all dark blue)
     174- after the 3 calibration runs are taken by the system, the calibration constants behave different:
     175 - the magenta line (normally around +1000mV) has pretty long error bars and is curved over the whole canvas
     176
     177__Solution__
     178
     179- stop dimscript
     180 - dimctrl --stop from a bash
     181- Reset crate (see <A HREF="#tscratereset">above</A>)
     182 - this doesn't helped often, but it doesn't need so much time as the power cycle
     183- if the problem appears again: __power cycle__ the camera (see <A HREF="#tshardware">below</A>)
     184
     185__Not Helping__
     186
     187- all the rest
     188
     189### startup - no proper connection problem
     190
     191__Symptom__
     192
     193- happens usually during start up
     194- several FADs are in state "Waiting" (orange in the GUI)
     195
     196__Solution__
     197
     198- disenable the corresponding FADs by clicking in the GUI on them
     199- wait a short time
     200- enable the corresponding FADs by clicking again on them (wait 3 seconds between two clicks)
     201
     202__Not Helping__
     203
     204- killing / quiting of fadctrl
     205
     206### fadctrl hangs in state configuring by taking an external lp run
     207
     208__Symptom__
     209
     210- fadctrl hangs in state configuring when it starts the external lp run during datataking (point 17. under [[datatakingdetails#normaldata normal datataking procedure]] the 9. run)
     211- all FADs are connected (if not all FADs are connected it is a FAD connection loss (see above)
     212
     213__Solution__
     214
     215- normally the shifter just forgot to open the lid, so
     216 - stop script
     217 - open the lid (see [[preparation#startup Start Up procedure]] , point 9)
     218
     219__Not Helping__
     220
     221- all the rest
     222
     223---
     224
     225<A NAME="tsftus"/>
     226FTUs / ftmctrl
     227----
     228
     229### ftm in state ERROR:
     230
     231__Symptom__
     232
     233- ftmctrl in state ERROR
     234- one FTU is not green in the FTU tab of the GUI
     235
     236__Solution__
     237
     238- stop dimscript
     239 - dimctrl --stop from a bash
     240- switch off the trigger and try *Ping* (FTU-tab)
     241 - a FTU can erroneously be marked as *in error* after the GUI has been restartet, a *Ping* resolve that
     242- if this doesn't help do a Crate Reset (see <A HREF="#tscratereset">above</A>)
     243
     244__Not Helping__
     245
     246- stopping fadctrl or ftmctrl
     247- Reset other or all Crates
     248- quiting or killing any program
     249- power cycling the camera
     250
     251### ftmctrl in state ClockCondError:
     252
     253__Symptom__
     254
     255- ftmctrl in state ClockCondError
     256- clock conditioner is not locked
     257
     258__Solution__
     259
     260- FTM_CONTROL/RESET_CONFIGURATION
     261- or the MCP Reset button
     262- make sure that the clock conditioner is locked before data taking
     263
     264__Not Helping__
     265
     266---
     267
     268
     269<A NAME="tsdrivectrl"/>
     270DriveCtrl / Cosy
     271----
     272
     273### Cosy in state Error
     274
     275__Symptom__
     276
     277- drivectrl and Cosy in state ERROR
     278- drivectrl and Cosy in state ARMED (when just the Tracking stopped accidently)
     279
     280__Solution__
     281
     282- DRIVECTRL/RESUME
     283
     284The RESUME command will proceed the following steps (so only RESUME is necessary to solve the Drive Error):
     285
     286- DRIVECTRL/STOP (now drivectrl shut be in state armed)
     287- DRIVECTRL/TRACK_SOURCE x y sourcename
     288 - last Tracking command: see output of dimctrl
     289 - see [[datatakingdetails#pointingpositions Current Pointing Positions]]
     290
     291__Not Helping__
     292
     293- all other
     294
     295### drivectrl in state 3
     296
     297__Symptom__
     298
     299- drivectrl is in status 3 (LOCKED)
     300
     301__Solution__
     302
     303- drivectrl goes in this status when the sun is rising, so if this "problem" occurs in the morning it is properly not a problem, it's only the way the telescope should behave, due to the rising sun
     304- if this problem occurs during startup, you have to unlock the telescope:
     305 - DRIVECTRL/UNLOCK
     306
     307__Not Helping__
     308
     309- DRIVECTRL/STOP
     310- killing / quiting drivectrl or cosy
     311
     312### Manual parking of the telescope
     313
     314__Symptom__
     315
     316- drivectrl and Cosy don't accept / react to commands, for example an error with the IndraDrives occur
     317
     318__Solution__
     319
     320- if it's not to late in the night, try to call an expert before you park manually
     321- Park the telescope manually:
     322
     3230. Proceed all parts of the Shutdown procedure you can do in the current situation (see [[shutdown here]])
     324 - be sure the bias voltage is OFF!
     325 - wear a helmet when going to the telescope
     3260. there are two bars near the door, to move the telescope in the azimuth and zenith direction:
     327 - long bar: zenith
     328 - short bar: azimuth
     3290. These bars, you can use for turning the telescope manually on the spots provided for this
     330 - azimuth: below the telecope
     331 - zenith: right of the mirrors
     332 - be aware that you don't turn in a way, that the cables get damaged
     3330. Turn the telescope in the Parking Position (pointing north, towards the old container)
     334
     335---
     336
     337<A NAME="tscomputers"/>
     338Computers
     339----
     340
     341### Information about the Computers
     342
     343You can find several informations about the Computer on La Palma on the [[computing Computing page]]
     344
     345### Restarting a hanging PC
     346
     347__Symptom__
     348
     349- the PC can't be reached per ssh, or something similar
     350- be aware, that when all computers (expect for gate) seems to hang, its normally data which hangs, the other only try to mount the home from data, so they hang too
     351
     352__Solution__
     353
     354if it's not to late in the night, try to call an expert before you power cycle the computers
     355
     356When you have to restart more than one PC, be sure you follow the Start-up procedure mentioned [[computing here]]
     357
     358Power cycle the hanging computer from any other computer on the FACT internal network:
     359
     360- go in /usr/local/bin
     361- execute one off the following scripts: aux_off, gui_off, gate_off,  daq_off, data_off
     362- wait a few minutes
     363- execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON
     364- Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively
     365
     366or power cycle the hanging computer manually from the FACT-container
     367
     368---
     369
     370<A NAME="tsarduino"/>
     371Arduino Reset
     372----
     373
     374Find here a document with information how to reset an arduino.
     375https://www.fact-project.org/logbook/showthread.php?tid=1500
     376
     377---
     378
     379<A NAME="tshardware"/>
     380Hardware
     381----
     382
     383### Power Cycling the camera
     384
     385__Symptom__
     386
     387- A Crate Reset didn't solved a FTU or FAD problem
     388
     389__Solution__
     390
     391- stop all scripts: dimctrl --stop from a bash
     392- set bias to 0:  - BIAS_CONTROL/SET_ZERO_VOLTAGE
     393- Stop Trigger:
     394 - FTM_CONTROL/STOP_TRIGGER
     395 - or Stop Trigger in the left part of the GUI
     396- FTM_CONTROL/DISCONNECT
     397- FEEDBACK/STOP
     398- if fadctrl is in state WritingData
     399 - FAD_CONTROL/CLOSE_OPEN_FILES
     400 - fadctrl now should be in state connected
     401- FAD_CONTROL/STOP
     402
     403Now __power cycle__ the camera
     404
     4050. Switch off the agilent camera in the FACT Container
     406 - open the page 10.0.100.100 from a computer in the internal network of La Palma (can be your computer if you use vpn)
     407 - push 'Camera Off' to stop the camera (now only 'Bias Power' and '24VDC Interlock Power' should be 'yes')
     4080. wait for about 15 min
     4090. switch the agilent on
     410 - open the page 10.0.100.100 from a computer in the internal network of La Palma
     411 - push 'Camera On' to start the camera (now all 5 points should be 'yes')
     4120. make sure that clock-conditioner is locked
     4130. follow the [[preparation#startup Start-Up procedure]] to get the system running again
     414
     415To check if the __clock-conditioner__ is locked, you may check in the gui in the tab
     416
     417- 'Trigger' the led next to the pulldown 'DRS sampling frequency'
     418- 'FAD' with mouse over on the LED next to 'Reference Clock' whether all 40 values are roughly 1/1024
     419(some more information in a post by Patrick: https://www.fact-project.org/logbook/showthread.php?tid=1102&pid=6070#pid6070 )
     420
     421### Bias
     422In case you need to switch off the bias crate, please do first BIAS_CONTROL/DISCONNECT