Changes between Version 1 and Version 2 of TroubleShooting


Ignore:
Timestamp:
08/08/13 08:53:08 (12 years ago)
Author:
ftemme
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • TroubleShooting

    v1 v2  
    2828TroubleShootingDrivectrl
    2929
     30- cosy in state ERROR
     31- drivectrl in state LOCKED
     32- Manual parking of the telescope
     33
    3034TroubleShootingComputers
     35
     36- Informations about the Computers
     37- Restarting a hanging PC
    3138
    3239TroubleShootingArduino
    3340
     41- Arduino Reset
     42
    3443TroubleShootingHardware
     44
     45- Power Cycling the camera
     46- Switching off the bias crate
    3547
    3648== General Remarks ==
     
    4254 - restarting a program is in most cases not a solution and only increase the risk to trigger more problems. So avoid restarting programs as long as possible
    4355
    44 ### bias disconnection
    45 
    46 __Symptom__
    47 
    48 - biasctrl is in state DISCONNECTED
    49 
    50 __Solution__
    51 
    52 - Do  _RECONNECT_
    53 - Send a command like _REQUEST_STATUS_
    54 - Sometime the bias crate will disconnect again, do _RECONNECT_
    55 
    56 __Not Helping__
    57 
    58 - do not close or kill biasctrl
    59 
    60 ### OverCurrentStatus
    61 
    62 __Symptom__
    63 
    64 - when biasctrl ramped the voltage, it get in the state OVERCURRENT
    65 
    66 __Solution__
    67 
    68 - First try biasctrl/RESET_OVER_CURRENT_STATUS (maybe a few times)
    69 - if it don't help try to ramp the voltage down
    70  - biasctrl/SET_GLOBAL_DAC 0
    71 
    72 __Not Helping__
    73 
    74 - do not close or kill biasctrl
    75 
    76 ### Status notReferenced
    77 
    78 __Symptom__
    79 
    80 - biasctrl is in state notReferenced
    81 
    82 __Solution__
    83 
    84 - start the Ramping again
    85  - biasctrl/START
    86 
    87 __Not Helping__
    88 
    89 - do not close or kill biasctrl
    90 
    91 ---
    92 
    93 <A NAME="tsfads"/>
    94 FADs
    95 ----
    96 
    97 ### start up - connection problem
    98 
    99 __Symptom__
    100 
    101 - problem occurs usually during start up
    102 - after FAD_CONTROL/START or pushing FAD -> START button in the GUI, not all 40 FAD LEDs are green
    103 - fadctrl is in state Connecting (instead of Connected)
    104 
    105 __Solution__
    106 
    107 - stop dimscripts
    108  - dimctrl --stop from a bash
    109 - do a __crate reset__:
    110 
    111 <A NAME="tscratereset"/>
    112 
    113 There exist a script which will do the __Crate Reset__ automatically:
    114 
    115 - _.x ScriptsForDimCtrl/ResetCrate.dim C=n_
    116 - n = number of crate you want to reset
    117 
    118 Crate Reset manually:
    119 
    120 - disable all FTUs (in the FTU tab of the GUI)
    121 - disconnect the 10 FADs in the crate, by clicking the LEDs
    122 - ftmctrl/RESET_CRATE x (x corresponding crate)
    123 - enable all FTUs (in the FTU tab of the GUI)
    124 - reconnect the 10 FADs, one by one with 3 seconds waiting in between (the FAD needs this time to boot)
    125 
    126 __Not Helping__
    127 
    128 - disconnect / reconnect to the FAD.
    129 - waiting
    130 - reset other / all crates (might just create another of these bugs in another FAD)
    131 - stopping or killing fadctrl
    132 - power cycling the camera (might just create another of these bugs in another FAD)
    133 
    134 <A NAME="tsfadloss"/>
    135 ### In-run-FAD-loss
    136 
    137 __Symptom__
    138 
    139 - during a run:
    140  - 4 adjacent *strange* patches in the events tab of the GUI
    141  - orange warnings in fadctrl-console (eventbuilder realises, that one FAD stopped sending data)
    142  - trigger rate drop to 0
    143 - when the run ended and a new one is started
    144  - MCP hang in state Configuring3
    145  - fadctrl hang in state Configuring2
    146 
    147 __Solution__
    148 
    149 - find out which FAD board is disconnected
    150  - the one board in the FAD tab which has the strange behaviour
    151  - the one without a thick on its LED in the GUI (is only possible to see, when a new run is started and the system hangs in ConfiguringN)
    152 - stop dimscript (by dimctrl --stop from a bash)
    153  - only possible when dimctrl was started as a server instance (dimctrl --server)
    154  - otherwise you have to kill dimctrl via ctrl + c
    155 - reset MCP (MCP/RESET)
    156 - Disconnect the problematic FAD (clicking on the corresponding LED in the GUI
    157 - wait 3 to 5 sec
    158 - Reconnect the problematic FAD (again clicking)
    159 
    160 __Not Helping__
    161 
    162 - waiting
    163 - reset any crates (might just create another start up connection problem)
    164 - stopping or killing fadctrl
    165 - power cycling the camera (might just create another start up connection problem)
    166 
    167 <A NAME="tsdrsunderflow"/>
    168 ### DRS underflow problem
    169 
    170 __Symptom__
    171 
    172 - happens usually during start up
    173 - during first data taking one single patch in the events tab appears different (e.g. all dark blue)
    174 - after the 3 calibration runs are taken by the system, the calibration constants behave different:
    175  - the magenta line (normally around +1000mV) has pretty long error bars and is curved over the whole canvas
    176 
    177 __Solution__
    178 
    179 - stop dimscript
    180  - dimctrl --stop from a bash
    181 - Reset crate (see <A HREF="#tscratereset">above</A>)
    182  - this doesn't helped often, but it doesn't need so much time as the power cycle
    183 - if the problem appears again: __power cycle__ the camera (see <A HREF="#tshardware">below</A>)
    184 
    185 __Not Helping__
    186 
    187 - all the rest
    188 
    189 ### startup - no proper connection problem
    190 
    191 __Symptom__
    192 
    193 - happens usually during start up
    194 - several FADs are in state "Waiting" (orange in the GUI)
    195 
    196 __Solution__
    197 
    198 - disenable the corresponding FADs by clicking in the GUI on them
    199 - wait a short time
    200 - enable the corresponding FADs by clicking again on them (wait 3 seconds between two clicks)
    201 
    202 __Not Helping__
    203 
    204 - killing / quiting of fadctrl
    205 
    206 ### fadctrl hangs in state configuring by taking an external lp run
    207 
    208 __Symptom__
    209 
    210 - fadctrl hangs in state configuring when it starts the external lp run during datataking (point 17. under [[datatakingdetails#normaldata normal datataking procedure]] the 9. run)
    211 - all FADs are connected (if not all FADs are connected it is a FAD connection loss (see above)
    212 
    213 __Solution__
    214 
    215 - normally the shifter just forgot to open the lid, so
    216  - stop script
    217  - open the lid (see [[preparation#startup Start Up procedure]] , point 9)
    218 
    219 __Not Helping__
    220 
    221 - all the rest
    222 
    223 ---
    224 
    225 <A NAME="tsftus"/>
    226 FTUs / ftmctrl
    227 ----
    228 
    229 ### ftm in state ERROR:
    230 
    231 __Symptom__
    232 
    233 - ftmctrl in state ERROR
    234 - one FTU is not green in the FTU tab of the GUI
    235 
    236 __Solution__
    237 
    238 - stop dimscript
    239  - dimctrl --stop from a bash
    240 - switch off the trigger and try *Ping* (FTU-tab)
    241  - a FTU can erroneously be marked as *in error* after the GUI has been restartet, a *Ping* resolve that
    242 - if this doesn't help do a Crate Reset (see <A HREF="#tscratereset">above</A>)
    243 
    244 __Not Helping__
    245 
    246 - stopping fadctrl or ftmctrl
    247 - Reset other or all Crates
    248 - quiting or killing any program
    249 - power cycling the camera
    250 
    251 ### ftmctrl in state ClockCondError:
    252 
    253 __Symptom__
    254 
    255 - ftmctrl in state ClockCondError
    256 - clock conditioner is not locked
    257 
    258 __Solution__
    259 
    260 - FTM_CONTROL/RESET_CONFIGURATION
    261 - or the MCP Reset button
    262 - make sure that the clock conditioner is locked before data taking
    263 
    264 __Not Helping__
    265 
    266 ---
    267 
    268 
    269 <A NAME="tsdrivectrl"/>
    270 DriveCtrl / Cosy
    271 ----
    272 
    273 ### Cosy in state Error
    274 
    275 __Symptom__
    276 
    277 - drivectrl and Cosy in state ERROR
    278 - drivectrl and Cosy in state ARMED (when just the Tracking stopped accidently)
    279 
    280 __Solution__
    281 
    282 - DRIVECTRL/RESUME
    283 
    284 The RESUME command will proceed the following steps (so only RESUME is necessary to solve the Drive Error):
    285 
    286 - DRIVECTRL/STOP (now drivectrl shut be in state armed)
    287 - DRIVECTRL/TRACK_SOURCE x y sourcename
    288  - last Tracking command: see output of dimctrl
    289  - see [[datatakingdetails#pointingpositions Current Pointing Positions]]
    290 
    291 __Not Helping__
    292 
    293 - all other
    294 
    295 ### drivectrl in state 3
    296 
    297 __Symptom__
    298 
    299 - drivectrl is in status 3 (LOCKED)
    300 
    301 __Solution__
    302 
    303 - drivectrl goes in this status when the sun is rising, so if this "problem" occurs in the morning it is properly not a problem, it's only the way the telescope should behave, due to the rising sun
    304 - if this problem occurs during startup, you have to unlock the telescope:
    305  - DRIVECTRL/UNLOCK
    306 
    307 __Not Helping__
    308 
    309 - DRIVECTRL/STOP
    310 - killing / quiting drivectrl or cosy
    311 
    312 ### Manual parking of the telescope
    313 
    314 __Symptom__
    315 
    316 - drivectrl and Cosy don't accept / react to commands, for example an error with the IndraDrives occur
    317 
    318 __Solution__
    319 
    320 - if it's not to late in the night, try to call an expert before you park manually
    321 - Park the telescope manually:
    322 
    323 0. Proceed all parts of the Shutdown procedure you can do in the current situation (see [[shutdown here]])
    324  - be sure the bias voltage is OFF!
    325  - wear a helmet when going to the telescope
    326 0. there are two bars near the door, to move the telescope in the azimuth and zenith direction:
    327  - long bar: zenith
    328  - short bar: azimuth
    329 0. These bars, you can use for turning the telescope manually on the spots provided for this
    330  - azimuth: below the telecope
    331  - zenith: right of the mirrors
    332  - be aware that you don't turn in a way, that the cables get damaged
    333 0. Turn the telescope in the Parking Position (pointing north, towards the old container)
    334 
    335 ---
    336 
    337 <A NAME="tscomputers"/>
    338 Computers
    339 ----
    340 
    341 ### Information about the Computers
    342 
    343 You can find several informations about the Computer on La Palma on the [[computing Computing page]]
    344 
    345 ### Restarting a hanging PC
    346 
    347 __Symptom__
    348 
    349 - the PC can't be reached per ssh, or something similar
    350 - be aware, that when all computers (expect for gate) seems to hang, its normally data which hangs, the other only try to mount the home from data, so they hang too
    351 
    352 __Solution__
    353 
    354 if it's not to late in the night, try to call an expert before you power cycle the computers
    355 
    356 When you have to restart more than one PC, be sure you follow the Start-up procedure mentioned [[computing here]]
    357 
    358 Power cycle the hanging computer from any other computer on the FACT internal network:
    359 
    360 - go in /usr/local/bin
    361 - execute one off the following scripts: aux_off, gui_off, gate_off,  daq_off, data_off
    362 - wait a few minutes
    363 - execute one off the following scripts: aux_ON, gui_ON, gate_ON, daq_ON, data_ON
    364 - Rebooting will take a few minutes for aux, gui, gate and about 10 min. for daq and data, respectively
    365 
    366 or power cycle the hanging computer manually from the FACT-container
    367 
    368 ---
    369 
    370 <A NAME="tsarduino"/>
    371 Arduino Reset
    372 ----
    373 
    374 Find here a document with information how to reset an arduino.
    375 https://www.fact-project.org/logbook/showthread.php?tid=1500
    376 
    377 ---
    378 
    379 <A NAME="tshardware"/>
    380 Hardware
    381 ----
    382 
    383 ### Power Cycling the camera
    384 
    385 __Symptom__
    386 
    387 - A Crate Reset didn't solved a FTU or FAD problem
    388 
    389 __Solution__
    390 
    391 - stop all scripts: dimctrl --stop from a bash
    392 - set bias to 0:  - BIAS_CONTROL/SET_ZERO_VOLTAGE
    393 - Stop Trigger:
    394  - FTM_CONTROL/STOP_TRIGGER
    395  - or Stop Trigger in the left part of the GUI
    396 - FTM_CONTROL/DISCONNECT
    397 - FEEDBACK/STOP
    398 - if fadctrl is in state WritingData
    399  - FAD_CONTROL/CLOSE_OPEN_FILES
    400  - fadctrl now should be in state connected
    401 - FAD_CONTROL/STOP
    402 
    403 Now __power cycle__ the camera
    404 
    405 0. Switch off the agilent camera in the FACT Container
    406  - open the page 10.0.100.100 from a computer in the internal network of La Palma (can be your computer if you use vpn)
    407  - push 'Camera Off' to stop the camera (now only 'Bias Power' and '24VDC Interlock Power' should be 'yes')
    408 0. wait for about 15 min
    409 0. switch the agilent on
    410  - open the page 10.0.100.100 from a computer in the internal network of La Palma
    411  - push 'Camera On' to start the camera (now all 5 points should be 'yes')
    412 0. make sure that clock-conditioner is locked
    413 0. follow the [[preparation#startup Start-Up procedure]] to get the system running again
    414 
    415 To check if the __clock-conditioner__ is locked, you may check in the gui in the tab
    416 
    417 - 'Trigger' the led next to the pulldown 'DRS sampling frequency'
    418 - 'FAD' with mouse over on the LED next to 'Reference Clock' whether all 40 values are roughly 1/1024
    419 (some more information in a post by Patrick: https://www.fact-project.org/logbook/showthread.php?tid=1102&pid=6070#pid6070 )
    420 
    421 ### Bias
    422 In case you need to switch off the bias crate, please do first BIAS_CONTROL/DISCONNECT