source: fact/Evidence/Doc/Evidence.tex@ 10047

Last change on this file since 10047 was 262, checked in by ogrimm, 14 years ago
Removed signaling to invoke ConfigChanged(), now it is run as separate thread. New command '/ResetMessage'
  • Property svn:keywords set to Rev
File size: 46.5 KB
Line 
1\documentclass[10pt,twoside,fleqn,a4paper]{article}
2
3\usepackage[bf]{caption}
4\usepackage{listings}
5\usepackage{graphicx}
6\usepackage{subfigure}
7\usepackage{amssymb}
8\usepackage{textcomp}
9\usepackage{units}
10\usepackage{exscale}
11\usepackage{url}
12\usepackage{listings}
13\usepackage[light]{draftcopy}
14\usepackage{longtable}
15
16
17\newcommand{\E}{\lstinline{Evidence} }
18
19\renewcommand{\textfraction}{.15}
20\renewcommand{\topfraction}{.85}
21\renewcommand{\bottomfraction}{.85}
22
23\setlength{\oddsidemargin}{2.5cm}
24\addtolength{\oddsidemargin}{-1in} % because of strange definition of origin
25\setlength{\evensidemargin}{2.5cm}
26\addtolength{\evensidemargin}{-1in}
27\setlength{\topmargin}{2.5cm}
28\addtolength{\topmargin}{-1in}
29\setlength{\textheight}{23.2cm}
30\setlength{\textwidth}{16.0cm}
31
32\frenchspacing
33\lstset{basicstyle=\ttfamily,breaklines=true}
34
35\begin{document}
36
37\title{\E --- A control system for laboratory application and small-scale experiments}
38\author{Oliver Grimm, ETH Z\"{u}rich}
39\maketitle
40
41This report describes the design and basic functionality of the \E control system. This essentially is a C++ class and a set of programs running on Linux for controlling small scale experiments. It is based on CERN's DIM library for interprocess communication over TCP/IP connections. \lstinline{$Rev: 262 $}
42
43\tableofcontents
44
45% ===================================================
46
47\section{Overview of \E}
48\label{Overview}
49
50The \E control system has been developed for application in small-scale experiments, for which established and comprehensive control systems like EPICS, DOOCS or PVS-II are too large and impose too much overburden.\footnote{Information on these systems can be found at \url{http://www.aps.anl.gov/epics/}, \url{http://tesla.desy.de/doocs}, \url{http://www.etm.at/}.} The development of \E has been started within the FACT project (\emph{First G-APD Cherenkov Telescope} \cite{Bra09,And10}).
51
52An experiment control system often comprises several individual programs that require configuration information, produce data that should be stored and easily visualized, and at least partly need to exchange information between each other. The intention of \E is to a) integrate this with a minimum of extra coding on part of the applications, b) to achieve this using as far as reasonable established tools, c) to be centralized\footnote{This is a design choice that partly defines the range of applicable experiments. For large systems decentralization can be a better approach.}, and d) to be lean.
53The common functionality is collected in a small C++ class. Its main objective is to allow easy surveillance of the system by a central alarm server and aiding the programmer by providing a few standard routines. It supports easy message logging as well as the distribution of warning and error conditions by the individual programs, but otherwise only little definite structure is imposed on the programmer
54
55\E uses the DIM (Distributed Information Management) system\footnote{http://dim.web.cern.ch/dim} as communication layer \cite{Gas01}. This system is developed and actively maintained at CERN for more than 17 years. DIM itself can be compiled on many platforms, but \E is developed on Linux and uses several functions specific to this operating system.
56
57Sections \ref{Overview} and \ref{DIM-Basics} give a general overview of \E and its components, the remaining sections contain more technical details.
58
59
60\subsection{Main programs}
61
62The functionality of the main programs of the the \E control system\footnote{The denomination \emph{control system} for the core programs of \E is exaggerated, as without further servers that provide experiment-specific functionality, \E would only control itself. Lacking a better term, this report sticks to it.} are briefly listed here. More details, including all required configuration items and services provided, can be found in Sect.\,\ref{ServerDetails}.
63
64\begin{itemize}
65\item \lstinline{Config}\\
66The configuration server distributes text data read from a configuration file to clients via remote procedure calls. The configuration file is watched for modifications and clients are informed on updates via a DIM service.
67
68\item \lstinline{DColl}\\
69The data collector dynamically subscribes to all DIM services (except those explicitely excluded) and writes the corresponding data at every service update to a file. It also provides a central logging facility via a DIM command.
70
71\item \lstinline{Alarm}\\
72The alarm server subscribes to the standardized message service of a given list of servers and monitors their severity. It generates a master alarm and can send email notifications in case of server warnings, errors or unavailability. An alarm has to be explicitely acknowledged through a DIM command to be reset.
73
74\item \lstinline{History}\\
75The history server keeps a number of updates of all services in a memory ring buffer and can send the buffer to a client if requested through a DIM remote procedure call. This can be used by user interfaces to provide a quick visualization of the past behavior of a given service without accessing the data stored on disk.
76
77\item \lstinline{Bridge}\\
78The bridge connects two DIM networks, repeating services, commands and remote procedure calls from one to the other. It allows commands only from a given set of IP addresses, thus providing some limited access control (but no authentication).
79\end{itemize}
80
81A graphical user interface has been developed for the FACT project, see Sect.\,\ref{User-Interface}. It consists of a set of Qt widgets adapted to display the contents of DIM services and history buffers, as well as sending commands.
82
83
84\section{Basic DIM functionality}
85\label{DIM-Basics}
86
87As DIM defines the basic operation principle of the control system, its functionality is briefly sketched here. Details explained in the DIM manual are not reported, but features and caveats not properly documented are summarized in Appendix \ref{DIMDetails}.
88
89DIM uses the client/server approach. DIM services are known to the user only by their name\footnote{The name is a standard C string.}, not by the location of the corresponding server on the network. A central name server brings clients and servers in contact. If a client wants to subscribe to a particular service, he first contacts the name server and then establishes, using the obtained information, a direct TCP/IP connection to the server. The name server is then not involved further and could even go down without affecting the established connections.
90
91There is no access control implemented within DIM. Authentication should be done via other methods that allow to restrict access to IP addresses and ports. An example configuration for the Linux \lstinline{IPtables} package is given in Appendix \ref{FirewallExample}.
92
93
94\subsection{Server and client}
95
96A server provides, via unique names, services, commands and remote procedure calls. A DIM service contains data that a client can subscribe to, a DIM command is a one way transfer of data from a client to a server\footnote{The client can optionally be informed if the command arrived at the server, but not if it was actually processed.}, and a DIM remote procedure call in addition will sent back data to the client.
97
98Subscription in this context means that the client can be informed automatically if updated data becomes available or if the service became unavailable. A client can also choose to be updated at regular intervals.\footnote{Also in this case the update is initiated by the server, to which the requested periodic update rate is send.} Subscribing to a not yet existing service is possible. The connection will then be established automatically as soon as it becomes available.
99
100A dedicated TCP/IP connection is established between each server and client.\footnote{That also means that if 10 clients subscribe to a particular service, the data will be send 10 times over the network. This eventually will limit the scalability of the system.} A server takes at startup the first free port in the range between 5100 and 6000 to respond to clients.
101
102The overhead imposed by DIM on a TCP/IP connection is small. The throughput can almost reach the line speed, but depends to some extend on the amount of data transmitted per service update. A non-negligible load is generated at service reception by the thread-based approach of DIM as data arrival normally triggers the execution of a handler in a separate thread. This requires a context switch that constrains even on fast computers the rate of updates and, for small service sizes, the throughput.
103
104\subsection{Name server}
105
106The name server keeps a list of all services, commands and remote procedure calls available in the system and their corresponding server addresses. It distributes this information to clients wishing to make a connection. This is done by the DIM library in the background and requires no user action. The information is also accessible to a client via a DIM service (see Appendix \ref{ServiceFormats}).
107
108\subsection{Implementation in an application}
109
110Implementing the DIM server functionality requires only little additional coding in an existing program, typically a few lines of code and inclusion of the DIM library. The library handles all communication transparently in the background (see Sect.\,\ref{Implementation} for more details).
111
112DIM provides C++, C, Fortran, Java and Python interfaces, and can be compiled for various variants of Linux and Windows on 32 and 64 bit architectures.
113
114At several places, handlers (also called call-back routines) can be implemented by the user application that are then invoked automatically by the DIM library when a request from a client or an answer from a server arrives. These handlers are executed in a separate thread, and care has to be taken by the programmer in accessing data structures from a handler. All calls to handlers are strictly serialized by DIM, thus an access protection mechanism is not needed if data is only accessed from within these handlers.
115
116
117\section{Detailed description of the \E class}
118
119In principle, there is no need to use a particular collection of functions beyond those defined by DIM itself to interconnect programs with DIM. A certain number of tasks, however, are repetitive, and some standardization is beneficial for supervising the server functionality.
120
121The C++ class \lstinline{EvidenceServer} is used by all the main programs of \E to provide the common functionality that is needed for all of them. It is suggested that also user applications use this class. \lstinline{EvidenceServer} can simply be inherited by the user class, as shown in Sect.\,\ref{Server-Programming}.
122
123There is no corresponding \lstinline{EvidenceClient} class. Those methods that can be used also by a client are declared \lstinline{static} in \lstinline{EvidenceServer} and can thus be invoked without instantiation.
124
125\subsection{Class functionality}
126
127\begin{itemize}
128\item Starts the DIM server.
129\item Provides a standard text message service \lstinline{SvrName/Message}, including encoding of the message severity (INFO, WARN, ERROR, FATAL) and automatic logging of it. The initial message published (and thus registered by the data collector) contains the subversion revision number and built time of the server. The DIM service format is \lstinline{"I:1;C"}.
130\item Provides a method for configuration requests. If the configuration data is not available, the application terminates with a message of FATAL severity unless default data is given.
131\item Provides a method for safely translating DIM service data into text.
132\item Implements the virtual DIM methods \lstinline{exitHandler()}. It can be called through a standard DIM command \lstinline{SvrName/EXIT}, taking a single integer as argument. Upon first invocation, the handler just sets the flag \lstinline{ExitRequest} which should be handled by the application. Upon second invocation, it will call \lstinline{exit()}. The user application can override this handler.
133\item Provides a DIM command \lstinline{SvrName/ResetMessage}. It will set the message service to INFO severity with the information which client issued the command. This can be used to remove a WARN, ERROR or FATAL serverity once the problem has been fixed. The \lstinline{Alarm} server uses this command if it is instructed to reset an alarm level. The command takes no arguments.
134\item Implements the virtual DIM methods \lstinline{errorHandler()}. The error handler will issue a message with ERROR severity that contains the DIM error code. The user application can override this handler.
135\item Installs signal handler for SIGQUIT (ctrl-backspace), SIGTERM, SIGINT (ctrl-c), and SIGHUP (terminal closed). The signal handler sets first \lstinline{ExitRequest}, and on second invocation calls \lstinline{exit()}. After instantiating the class, the programmer may override the handlers.
136\item Catches un-handled C++ exceptions and extracts as much information from the exception as possible.\footnote{This termination handler is taken directly from the code of the \lstinline{g++} compiler and is thus compiler specific.} That information is also published as a message.
137\item Subscribes to the service \lstinline{Config/ModifyTime}. Upon updates to the configuration file, a call-back routine of the user application can be automatically invoked, see Sect.\,\ref{ConfigHandling} for more details.
138\end{itemize}
139
140
141\subsection{\lstinline{public} class methods}
142\label{EvidenceServer-Methods}
143
144The \lstinline{public} part of the header file \lstinline{Evidence.h} is as follows. The namespace designation \lstinline{std} has been left out for clarity in this listing.
145
146\begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt]
147#define NO_LINK (char *) "__&DIM&NOLINK&__"
148
149class EvidenceServer: public DimServer {
150
151 public:
152 EvidenceServer(string);
153 ~EvidenceServer();
154
155 enum MessageType {INFO=0, WARN=1, ERROR=2, FATAL=3};
156
157 void Message(MessageType, const char *, ...);
158 void SendToLog(const char *, ...);
159 string GetConfig(string, string = string());
160 virtual void ConfigChanged() {};
161 void Lock();
162 void Unlock();
163 static string ToString(char *, void *, int);
164 static bool ServiceOK(DimInfo *);
165 static bool ServiceOK(DimRpcInfo *);
166 static bool ServiceOK(DimCurrentInfo *);
167 static vector<string> Tokenize(const string &, const string & = " ");
168
169 bool ExitRequest;
170};
171\end{lstlisting}
172
173The class methods are thread safe as they either use only local data or lock access if necessary. \lstinline{NO_LINK} is used for service subscription by clients, see the method \lstinline{ServiceOK()} below and Sect.\,\ref{Client-Programming}
174
175The constructor \underline{\lstinline{EvidenceServer()}} takes the server name as argument which is subsequently automatically added to logging and message texts and also used for configuration requests with \lstinline{GetConfig()}.
176
177\underline{\lstinline{Message()}} updates the standard message service with the given text and severity. Formatting is as for \lstinline{printf()}. The text is also sent to the console and to the log file with \lstinline{SendToLog()}. In case of FATAL severity \lstinline{exit()} is invoked, so that the application can safely assume the call will not return. The permanent buffer for the DIM service is automatically allocated and freed.
178
179\underline{\lstinline{SendToLog()}} sends the text to the central log file via a non-blocking command. That method can be called also in a termination or crash handler.
180
181\underline{\lstinline{GetConfig()}} issues, on first invocation, a DIM remote procedure call to the configuration server to retrieve the required data and returns it as a string. The second argument gives the data to be returned in case the server is unavailable or cannot provide the requested data. If in this case the second string is empty, the program terminates with a FATAL message. Using the service \lstinline{Config/ModifyTime}, the server keeps track of changes to the configuration file in the background. Upon subsequent requests for the same configuration data, it only issues a remote procedure call again if the file changed in the meantime. If not, the same data already retrieved is returned. This way, this function can be repeatedly called, even at high rate, without generating unnecessary load to the configuration server (as the configuration file does not change frequently).
182
183The virtual method \underline{\lstinline{ConfigChanged()}} is executed in a separate thread when the configuration file changes. It can be reimplemented by the application. Calls to \lstinline{GetConfig()} from this method will be blocking and thus result in updated configuration data.
184
185The methods \underline{\lstinline{Lock()}} and \underline{\lstinline{Unlock()}} work on an internal mutex.\footnote{Its type is \lstinline{PTHREAD_MUTEX_ERRORCHECK}. In case an already locked mutex is re-locked, the corresponding system call will therefore return a error and thus avoid dead-locking. Error messages from \lstinline{Lock()} and \lstinline{Unlock()} are written to the console and to the log file. They are not published using \lstinline{Message()} since this method itself uses locking and calling it would result in an infinite recursion.} They are used by \lstinline{GetConfig()} but are also available for the user application to serialize access from multiple threads. Calling functions in the locked state should be avoided as it might result in re-locking.
186
187The static method \underline{\lstinline{ToString()}} translates the contents of a DIM service safely into a string that is returned. As no consistency between a service format and the contained data is guaranteed by DIM, precautions are necessary to avoid buffer overruns. The method currently handles the standardized message format \lstinline{"I:1;C"}, arrays of numbers and strings. All other formats are translated into a hex representation. The arguments are the DIM service format, a pointer to the service data and the data size in bytes. It is thread safe as it uses only the arguments and dynamically allocated storage.
188
189The static methods \underline{\lstinline{ServiceOK()}} take a pointer to a received service update or result of a remote procedure call (as available in the respective handlers) and safely checks if its contents is identical to the constant \lstinline{NO_LINK}. If so, they return false. If using the same constant in the service declaration, this provides a safe way of being informed if a particular service becomes unavailable. Then, the handler is called once for that service with the data content \lstinline{NO_LINK}.
190
191\underline{\lstinline{Tokenize()}} takes the string from the first argument, tokenizes it using the characters contained the second argument as delimeters, and returns a vector of strings containing all tokens.
192
193The boolean \underline{\lstinline{ExitRequest}} is set to \lstinline{true} by the signal handler when the program should terminate. The application should check that variable and react accordingly.\footnote{The reception of a signal usually makes system calls return with an error \lstinline{EINTR}. That behaviour can be used by the application to honour \lstinline{ExitRequest} without continuous polling (e.g by using the \lstinline{pause()} system call).}
194
195\subsection{Note on program termination}
196
197If the application does not react to \lstinline{ExitRequest} or an immediate program termination is required, several methods may invoke the \lstinline{exit()} system call. Orderly termination is then still possible in most cases if the application uses \lstinline{atexit()} to register a termination function that will be called by the operating system during execution of \lstinline{exit()}. That function can the do clean up work. If a class instance is declared \lstinline{static}, its destructor will also be called by \lstinline{exit()}.
198
199However, due to the nature of an exception or a signal resulting from an error condition, correct execution of the termination routines cannot always be guaranteed.
200
201
202\section{Main servers of \E}
203\label{ServerDetails}
204
205Starting a server requires that the environment variable \lstinline{DIM_DNS_NODE} contains the Internet address of the name server. Optionally, \lstinline{DIM_DNS_PORT} may be set if the name server answers on a port different from the standard 2505.
206
207\subsection{\lstinline{Config} --- Distribution of configuration information}
208
209The configuration server accesses a text file, formatted in the INI style, and responds to requests for configuration data from a client. To this end, it provides a remote procedure call with the name \lstinline{ConfigRequest}. The data send along by the client is interpreted as a C string in the format \lstinline{SVR_NAME ITEM}. The configuration server searches for a section \lstinline{SERVER_NAME} and then for a line starting with \lstinline{ITEM =} in the configuration file. It then return the text following the equal sign up to the next item line as a string, removing all leading, trailing and multiple white space and all comments.
210
211The file containing the configuration data is watched for modification using the Linux \lstinline{inotify} mechanism, so always the latest data is distributed. For this purpose, the file is also accessed with all buffering disabled. The configuration file is given as command-line option at start-up of the server. Usually, this should be the first server to be started.
212
213The configuration file format is illustrated here with an example from FACT.
214\begin{lstlisting}
215[SQM] # Sky Quality Monitor
216
217address = sqm.ethz.ch
218port = 10001
219period = 30
220
221
222[DColl] # Central Data Collector
223
224exclude = DIS_DNS/SERVER_INFO Alarm/Summary Bias/ConsoleOut
225 drsdaq/Count drsdaq/EventData drsdaq/ConsoleOut
226sizeupdate = 30 # Min delay in seconds between file size updates
227rollover = 12 # Hour of day for change of date
228\end{lstlisting}
229
230\noindent
231\begin{longtable}{lp{0.7\textwidth}}
232\multicolumn{2}{l}{\textbf{Invocation}} \\
233\multicolumn{2}{l}{\lstinline|Config <Name of configuration file>|} \\[1ex]
234\multicolumn{2}{l}{\textbf{Remote procedure call}} \\
235\lstinline|ConfigRequest| & Interpret data send along as C string in the form \lstinline|SERVER_NAME ITEM| and return the applicable configuration data as text or an empty response if data could not be found.\\[1ex]
236\multicolumn{2}{l}{\textbf{Services}} \\
237\lstinline|Config/ConfigData| & Contains the full text of the configuration file. If this service is not excluded in the data collector, the latest version is automatically written to the slow data stream at every update.\\
238\lstinline|Config/ModifyTime| & Contains the unix time of the last modification of the configuration file.
239\end{longtable}
240
241
242\subsection{\lstinline{DColl} --- Data collector}
243
244The data collector subscribes to all DIM services (except those excluded by the configuration item \lstinline{exclude}) and writes at every update of a service the data to a text file. A new file is generated daily, written to a directory that changes yearly. It translates the service data to text using the method \lstinline{ToText()} from the \lstinline{EvidenceServer} class and then replaces all non printable characters by spaces.
245
246This server provides a command to log information in a single log file. To copy or truncate this file, standard Linux tools like \lstinline{logrotate} can be used. Using a non-blocking DIM command data can be send for logging even as part of a crash handler, aiding in debugging.
247
248All files are opened in append mode, thus preventing overwriting of existing data.
249
250\noindent
251\begin{longtable}{lp{0.7\textwidth}}
252\multicolumn{2}{l}{\textbf{Configuration section \lstinline|[DColl]|}} \\
253\lstinline|exclude| & Services not to write to the data file. Regular expressions can be used.\\
254\lstinline|basedir| & Directory where to root the data file structure and where the log file resides.\\
255\lstinline|sizeupdate| & Minimum delay in seconds between updates to the file size services. Updates will never occur more frequently than once per second.\\
256\lstinline|rollover| & Hour of day in local time when to start a new data file.\\[1ex]
257\multicolumn{2}{l}{\textbf{Commands}} \\
258\lstinline|DColl/Log Text| & Interprets \lstinline|Text| as a C string and writes it to the log file, including information on the sender and a time stamp.\\[1ex]
259\multicolumn{2}{l}{\textbf{Services}} \\
260\lstinline|DColl/DataSizeMB| & Size of current data file in MByte.\\
261\lstinline|DColl/CurrentFile| & Name of current data file.\\
262\lstinline|DColl/LogSizeMB| & Size of log file in MByte.\\
263\end{longtable}
264
265\subsection{\lstinline{Alarm} --- Handling of error conditions}
266
267The alarm server maintains a list of \emph{alarm levels} for a given set of servers. The alarm levels are defined as \lstinline{OK} (0), \lstinline{WARN} (1), \lstinline{ERROR} (2), \lstinline{FATAL} (3), and \lstinline{UNAVAILABLE} (4). The first four result from the corresponding severities of the message services, to which the alarm server subscribes. The alarm level does not decrease if, for example, a server issues a message with severity \lstinline{WARN} after one with \lstinline{ERROR}. It is only reset by command or by restarting the alarm server.
268
269A master alarm is generated from the highest server alarm level. The alarm server also periodically checks if all required servers are up (searching for them with the DIM browser). It can send an email in case a server is down or in error. One email will be send with each increase of alarm level for each server.
270
271The alarm server itself could be monitored, if desired, using a Linux watch dog and/or from a remote operators panel.
272
273\noindent
274\begin{longtable}{lp{0.7\textwidth}}
275\multicolumn{2}{l}{\textbf{Configuration section \lstinline|[Alarm]|}} \\
276\lstinline|servers| & List of servers to check. An email address can be added to a server name by colon.\\
277\lstinline|period| & Interval in seconds to check for server availability.\\[1ex]
278\multicolumn{2}{l}{\textbf{Commands}} \\
279\lstinline|ResetAlarm xyz| & Reset alarm level of server \lstinline|xyz|.\\[1ex]
280\multicolumn{2}{l}{\textbf{Services}} \\
281\lstinline|Alarm/Summary| & Text listing all observed servers and their alarm level.\\
282\lstinline|Alarm/MasterAlarm| & The highest alarm level of all servers watched.\\
283\lstinline|xyz/AlarmLevel| & Highest alarm level of server \lstinline|xyz| since the start of the \lstinline|Alarm| server or the last reset command.
284\end{longtable}
285
286\subsection{\lstinline{History} --- Service histories}
287
288Data written by \lstinline{DColl} usually resides on a hard disk and is thus not quickly accessible to remote clients. A recently started user interfaces, for example, might want to provide a display to show the past behaviour of a DIM service. The \lstinline{History} server facilitates this by subscribing to all services and keeping their recent updates in a ring buffer in memory. The ring buffer entries are time stamped and contain exactly the data that was contained in the DIM service. The server provides a remote procedure call to retrieve that data. The ring buffers are written to files in case the history server is terminated, and re-read at next startup. The directory where the buffers are stored is given as command line parameter.
289
290The ring buffer implementation needs to store entries of variable size in a continuous memory region to allow transmission with DIM. This requires pointer manipulations that are more error prone than code using only standard C++ containers. This is one reason why the history functionality, which is not an essential component of the control system but a convenience function, is separated from the data collector.
291
292A request for a history buffer also implies a non-negligible transfer of data, thus if there are bandwidth issues, history servers can be installed at several places along a network chain that are separated by a bridge, see below.
293
294\noindent
295\begin{longtable}{lp{0.7\textwidth}}
296\multicolumn{2}{l}{\textbf{Invocation}} \\
297\multicolumn{2}{l}{\lstinline|History <Directory for storing history buffers>|} \\[1ex]
298\multicolumn{2}{l}{\textbf{Configuration section \lstinline|[History]|}} \\
299\lstinline|minchange| & Minimum absolute change necessary for a service to be added to the history buffer. The format is \lstinline|ServiceName:MinChange|. This is only meaningful for services that represent numbers or number arrays. For an array, the difference of the sum of the absolute values of all elements is compared to \lstinline|MinChange|.\\
300\lstinline|maxsize_kb| & Maximum size of a single history buffer in kByte. Default value is 2000.\\
301\lstinline|numentries| & Numer of entries that a history buffer should hold, provided its size does not exceed the defined maximum. Default value is 1000. For DIM services of varying size, buffer sizes are recalculated at each update and never shrink.\\[1ex]
302\multicolumn{2}{l}{\textbf{Remote procedure call}} \\
303\lstinline|ServiceHistory Srvc| & Returns the history buffer of the given service if available, otherwise the response will be empty (zero bytes). If the buffer is not currently in memory because the corresponding service is not available, it will be searched for on disk.
304\end{longtable}
305
306To safely retrieve the data from a history buffer, a class \lstinline{EvidenceHistory} is available. Its \lstinline{public} part is as follows.
307
308\begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt]
309class EvidenceHistory {
310
311 public:
312 struct Item {
313 int Time;
314 int Size;
315 char Data[]; // Size bytes follow
316 } __attribute__((packed));
317
318 EvidenceHistory(std::string);
319 ~EvidenceHistory();
320
321 bool GetHistory();
322 char *GetFormat();
323 const struct Item *Next();
324 void Rewind();
325};
326\end{lstlisting}
327
328The constructor takes as argument the name of a DIM service. Calling \lstinline{GetHistory()} will request a history buffer from the server and returns \lstinline{true} if successful. A blocking remote procedure call is used. \lstinline{Next()} will then iterate through the entries of the history buffer, starting at the oldest entry, and returns a pointer to a \lstinline{struct Item}, through which the time, size and data of the entry can be accessed. If no more data is in the buffer, \lstinline{NULL} is returned. To read the buffer again, \lstinline{Rewind()} can be called. The DIM service format is returned by \lstinline{GetFormat()}.
329
330The structure attribute ensures that no padding bytes are added by the compiler. That is directive specific to the \lstinline{g++} compiler.
331
332Accessing the history buffer through this class is recommended, as the format of the buffer might change at a later time.
333
334\subsection{\lstinline{Bridge} --- Connecting two DIM networks}
335
336The \lstinline{Bridge} server appears as a client on the primary DIM network (IP address and optionally port of the primary name server are given as command-line arguments), subscribes to all services not excluded in the configuration and forwards them to the secondary network where it acts as server (secondary name server given as usual by the environment variable \lstinline{DIM_DNS_NODE}). Services originating from various servers on the primary side will appear to be provided by the \lstinline{Bridge} server on the secondary side. The name, time stamp and service quality number are unchanged.
337
338The bridge also creates commands with the same name as seen on the primary side and forwards them to the actual servers. They are send non-blocking. Confirmation of reception at a client on the secondary side only indicates that the bridge has received the command. Remote procedure calls are internally handled by DIM as a command/service pair, thus no special handling for this case is required.
339
340All service updates and commands pass through the bridge server, thus it should run on a sufficiently powerful computer. An application of the bridge is to facilitate access control: a firewall between the two DIM networks can prohibit any connection except by the bridge where, for example, the origin of commands can be checked. The bridge can also help if the server load due to too many clients would become too high. A bridge between the server and the clients will allow to have only one direct connection to the server by the bridge, while all other clients only load the bridge.
341
342\noindent
343\begin{longtable}{lp{0.7\textwidth}}
344\multicolumn{2}{l}{\textbf{Invocation}} \\
345\multicolumn{2}{l}{\lstinline|Bridge <Primary name server node> [Primary name server port]|} \\[1ex]
346\multicolumn{2}{l}{\textbf{Configuration section \lstinline|[Bridge]|}} \\
347\lstinline|cmdallow| & IP addresses that are allowed to pass a command over the bridge. All other commands will be rejected. An \lstinline|INFO| entry in the message service will be made in this case. The address is compared to the result of the DimServer:: getClientName() call.\\[1ex]
348\lstinline|exclude| & Services not to bridge. This must always contain \lstinline|DIS_DNS/*| as a DIM name server exists on both sides and service names must be unique.\footnotemark It can also be desirable to not forward History services, and run instead a separate history server on the secondary side. Regular expressions are allowed.
349\end{longtable}
350
351\footnotetext{If needed, it would be easy to implement a renaming of services on the secondary side.}
352
353\section{Details on handling configuration updates}
354\label{ConfigHandling}
355
356An application typically requests configuration data at start-up and then uses this data repeatedly during its execution flow. The configuration file can be modified during that time and it might not always be clear to the user that the program still uses outdated information. A method to keep the application up-to-date with respect to its configuration data is thus desirable.
357
358\subsection{\lstinline{GetConfig()}}
359
360As a first step in achieving this, the application should not store the obtained configuration data internally, but always re-request it using the method \lstinline{GetConfig()} described in Sect.\,\ref{EvidenceServer-Methods}. This method will only issue a remote procedure call to the \lstinline{Config} server if the configuration file has been modified since the last invocation. So calling this method even at high rate will not load the configuration server at all if the configuraton file is unchanged, but will yield up-to-date information if it did change.
361
362The remote procedure call is blocking when called from the main thread or from the method \lstinline{ConfigChanged()} (which runs in a separate thread). It is non-blocking, using an \lstinline{rpcInfoHandler()}, when called from any other thread, especially also from the DIM handler thread. Blocking execution means that the remote procedure call will wait until the data has arrived from the server before returning to the application, whereas non-blocking execution will return immediately and invoke a handler later when the data arrived. This procedure is necessary since a blocking remote procedure call from \lstinline{infoHandler()} will result in a dead-lock.
363
364In the non-blocking case, the call to \lstinline{GetConfig()} returns still the previous, non-updated data even if the configuration file changed. The result of the non-blocking remote procedure call can only be processed by DIM once the current and all queued handler invocations have finished. When this is done, updated data will be returned by subsequent calls to \lstinline{GetConfig()}.
365
366\subsection{\lstinline{ConfigChanged()}}
367
368An alternative, albeit for the programmer more demanding, procedure for semi-automatic updates on configuration information is to reimplement the virtual method \lstinline{ConfigChanged()} in the user class. This method is invoked as a separate thread by the \lstinline{EvidenceServer} class whenever the service \lstinline{Config/ModifyTime} changes (and also at program start-up). As it is not running within the DIM handler thread, \lstinline{GetConfig()} will use blocking connections to get immediately up-to-date data when called from \lstinline{ConfigChanged()}.
369
370Running in a separate thread requires suitable protection by the programmer when accessing common data structures. To ease that, the \lstinline{EvidenceServer} class contains the pair of methods \lstinline{Lock()} and \lstinline{Unlock()} that work on an class internal mutex. The mutex type is \lstinline{PTHREAD_MUTEX_ERRORCHECK} and therefore includes error checking: no dead-lock will occur if double locking, but the program will terminate with a \lstinline{FATAL} message.
371
372
373\section{Implementing and compiling \E}
374\label{Implementation}
375
376\subsection{Makefile}
377
378If the environment variable \lstinline{DIM_DIR} is pointing to the DIM installation, a \lstinline{make} file for a program \lstinline{MyProg} can be as follows.
379
380\begin{verbatim}
381CC=g++
382PROG=MyProg
383
384CPPFLAGS += -I$(DIMDIR)/dim/
385LDLIBS += -lpthread $(DIMDIR)/linux/libdim.a
386
387all: $(PROG)
388
389$(PROG): $(PROG).o Evidence.o
390\end{verbatim}
391
392Instead of the static version, a shared library \lstinline{libdim.so} is also available. The \lstinline{EvidenceServer} class is linked (and compiled, if not done yet) upon generation of \lstinline{MyProg}. It should reside in the same directory as \lstinline{MyProg.cc}, or else the correct path to \lstinline{Evidence.o} should be added .
393
394\subsection{Server}
395\label{Server-Programming}
396
397For a server, the following skeleton can be used.
398
399\begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt]
400#include "Evidence.h"
401
402// Class declaration
403class MyClass: public EvidenceServer { ... }
404
405// Constructor
406MyClass::MyClass(): EvidenceServer(SERVER_NAME) { ... }
407
408// Request configuration data
409std::string Data = GetConfig("config_item_name");
410
411// Create service
412static int Service = 0;
413NewService = new DimService(SERVER_NAME "/ServiceName", Service);
414
415// Remove service
416delete NewService;
417\end{lstlisting}
418
419Note that the contents of the service might be requested by a client at any time. As the service variable is passed by reference, it is mandatory that the lifetime of that variable is as long as that of the DIM service.
420
421\subsection{Client}
422\label{Client-Programming}
423
424A client can use the same header file and access the static methods and the constant \lstinline{NO_LINK} defined without instantiating the server class.
425
426\begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt]
427#include "Evidence.h"
428
429// Class declaration
430class MyClass: public DimClient { ... }
431
432// Subscribe to service using infoHandler()
433DataItem = new DimStampedInfo(ServiceName, NO_LINK, this);
434
435void MyClass::infoHandler() {
436
437 // Check if service became unavailable
438 if (!ServiceOK(getInfo())) { ... }
439
440 ...
441}
442\end{lstlisting}
443
444The various versions of \lstinline{DimInfo} are described in the DIM manual. At subscription and then at every update, the method \lstinline{infoHandler()} is executed. If the method \lstinline{ServiceOK()} returns false, the corresponding service became unavailable, otherwise the service data can be evaluated within the handler. The handler applies to all service subscriptions made within this class.
445
446
447\section{User interface}
448\label{User-Interface}
449
450A graphical user interface (GUI), implemented using the Qt and Qwt frameworks\footnote{Information on these frameworks is available at \url{http://qt.nokia.com/} and \url{http://qwt.sourceforge.net/}.}, is available. It derives from standard widget classes extended versions that can display the contents of DIM services and history buffers. A widget to send generic text commands is also available. Qwt is used to display graphs which is not supported by Qt.
451
452The GUI is called \emph{Evidence Data Display} (\lstinline{EDD}). It has a single point interface to the DIM system and distributes received service updates to its widgets using the Qt signal/slot mechanism. This is necessary since the DIM \lstinline{infoHandler()} receiving the updates runs in a separate thread, but manipulations of GUI elements within Qt may only be done by the main thread. This mechanism also guarantees that one GUI instance subscribes not more than once to a particular service, even if the same data is shown by multiple widgets.
453
454The GUI implementation is designed to be easily portable and does not use operating-system specifics. It sticks to standard C++ code, to Qt capabilities and to DIM. Qt is explicitely designed for cross-platform applications.
455
456
457\section*{Acknowledgments}
458\addcontentsline{toc}{section}{Acknowledgments}
459
460The help of Clara Gaspar from CERN was indispensable for understanding the DIM system and making good use of it. Manwoo Lee from Kyungpook National University kindly checked the main programs for bugs and missing error handling.
461
462
463\begin{appendix}
464
465\section{DIM features not documented in the manual}
466\label{DIMDetails}
467
468\subsection{Format of \lstinline{DIM_DNS/SERVER_LIST} and \lstinline{xyz/SERVICE_LIST}}
469\label{ServiceFormats}
470
471The name server \lstinline{DIS_DNS} provides a service \lstinline{DIS_DNS/SERVER_LIST} containing a C string. Subscribing to it, the client gets at first update a list of all currently existing servers in the format \lstinline{Servername@node}. Individual entries are separated by the \lstinline{|} character. Further updates will indicate additions (\lstinline{+}), deletions (\lstinline{-}) or error states (\lstinline{!}) of servers, with a list of the same format as before following. The error state indicates that the sever did not send its regular watchdog message. It can be treated as deleted by the client, as it will reappear with \lstinline{+} in case it sends the messages again.
472
473Each server \lstinline{xyz} has a service \lstinline{xyz/SERVICE_LIST}, also a C string. It contains line-feed separated entries listing all services, commands and remote procedure calls available from it. For a service, the entry is \lstinline{Name|Format|}, for a command \lstinline{Name|Format|CMD} (\lstinline{Format} may also be empty), and for a remote procedure call \lstinline{Name|Format_in,Format_out|RPC}. Additions and deletion of services are handled as above.
474
475Sequentially subscribing to \lstinline{DIS_DNS/SERVER_LIST} and then to all corresponding service lists allows a client to be kept up-to-date of all existing services, commands and remote procedure calls within the system. This mechanism ist used by the components \lstinline{DColl}, \lstinline{History} and \lstinline{Bridge} of \E. The \lstinline{DimBrowser} class is provided by DIM with similar functionality.
476
477\subsection{Miscellaneous}
478
479A standard service \lstinline{xyz/CLIENT_LIST} is provided by all DIM servers, with a structure similar to \lstinline{xyz/SERVICE_LIST}.
480
481If a command could not be delivered to the server it is discarded and not delivered later. This prevents the reception of spurious, delayed commands. In case the blocking version of \lstinline{sendCommand()} was used by the client, it receives in this case the return value 0.
482
483If the received value in the \lstinline{infoHandler()} indicates the service is unavailable (\lstinline{NO_LINK}), then \lstinline{getFormat()} should not be used as its result may point to an arbitrary memory location.
484
485For updating a service, the same version of the function as used for its creation must be used, otherwise the call will be ignored. If a service is created using \lstinline{Service = DimService(char *Name, char *Format, void *Data, int Size)}, then it is not possible to use \lstinline{Service->update(char* Newdata)}, even if the original format was \lstinline{"C"}.
486
487The data received from a service or a remote procedure call is not guaranteed by DIM to comply to the format of that service (e.g. format \lstinline{"C"} does not assure a \lstinline{'\0'} terminated C string). The data content is up to the sender. If in doubt, the receiver must make sure it does not access more than \lstinline{getSize()} bytes starting at the memory location given by \lstinline{getData()}. Function like \lstinline{getInt()} or \lstinline{getString()} are only casting the pointer \lstinline{getData()}.
488
489The main purpose of the format identifier of DIM services is to allow DIM to translate structure padding from the server architecture to the client. A format like \lstinline{"C:3;I:2;F"} is intended to correspond to a variable sized structure in C,
490\begin{verbatim}
491struct A {
492 char a[3];
493 int b[2];
494 float c[];
495}
496\end{verbatim}
497If server and client use the same structure definition, a cast like \lstinline{struct A *Pnt = getData()} will guarantee correct access to elements. For example, \lstinline{Pnt->b[1]} will contain the second integer, even if client and server pad structures differently. Padding can also be disabled for a server if desired.
498
499In general, no blocking functions should be called from within a DIM handler. Specifically, making a blocking remote procedure call in an \lstinline{infoHandler()} will dead lock.\footnote{Non-blocking reception using an \lstinline{rpcInfoHandler()} is possible}.
500
501If DIM is compiled without threads, it uses internally the signals SIGIO and SIGALRM for communication.
502
503Only a single DIM server can be started by a process. Even when publishing to two different DIM networks, the service names on both sides combined have to be unique (since internally, a single hash table is used).
504
505
506\section{Example for access control using IPtables}
507\label{FirewallExample}
508
509This is an example for an IPtables firewall setting taken from one of the FACT computers. Rule 4 is to allow \lstinline{ssh} connections, rule 5 is for the DIM servers, and rule 6 for \lstinline{X11} connections. Rule 2 is an example how direct connections from a particular IP address can be prohibited. That can be used to force all connection to go through the Bridge.
510
511\footnotesize
512\begin{verbatim}
513Chain RH-Firewall-1-INPUT (2 references)
514num target prot opt source destination
5151 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 255
5162 REJECT all -- 192.33.97.201 0.0.0.0/0 reject-with icmp-port-unreachable
5173 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
5184 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
5195 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpts:5100:6000 state NEW
5206 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpts:6000:6063 state NEW
5217 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited
522\end{verbatim}
523\normalsize
524Rule 2 prohibits connections to servers running on this computer. If the name server would be at another IP address, a client could still see the existence of the local servers and access also the list of services they provide, but connection attempts would be denied.
525
526%\section{Glossary}
527
528%DIM service
529%subversion (svn)
530%thread
531%widget
532
533
534\end{appendix}
535
536% ===================================================
537
538\begin{thebibliography}{xxxx00}
539
540\bibitem[Gas01]{Gas01} C. Gaspar, M. D\"{o}nszelmann and Ph. Charpentier, \emph{DIM, a portable, light weight package for information publishing, data transfer and inter-process communication}, Computer Physics Communications 140 1+2 102-9, 2001
541\bibitem[Bra09]{Bra09} I. Braun et al., Nucl. Inst. and Methods A 610, 400 (2009)
542\bibitem[And10]{And10} H. Anderhub et al., Nucl. Inst. and Methods, to be published (2010)
543
544\end{thebibliography}
545
546
547\end{document}
Note: See TracBrowser for help on using the repository browser.