\documentclass[10pt,twoside,fleqn,a4paper]{article} \usepackage[bf]{caption} \usepackage{listings} \usepackage{graphicx} \usepackage{subfigure} \usepackage{amssymb} \usepackage{textcomp} \usepackage{units} \usepackage{exscale} \usepackage{url} \usepackage{listings} \usepackage[light]{draftcopy} \usepackage{longtable} \newcommand{\E}{\lstinline{Evidence} } \renewcommand{\textfraction}{.15} \renewcommand{\topfraction}{.85} \renewcommand{\bottomfraction}{.85} \setlength{\oddsidemargin}{2.5cm} \addtolength{\oddsidemargin}{-1in} % because of strange definition of origin \setlength{\evensidemargin}{2.5cm} \addtolength{\evensidemargin}{-1in} \setlength{\topmargin}{2.5cm} \addtolength{\topmargin}{-1in} \setlength{\textheight}{23.2cm} \setlength{\textwidth}{16.0cm} \frenchspacing \lstset{basicstyle=\ttfamily,breaklines=true} \begin{document} \title{\E --- A control system for laboratory application and small-scale experiments} \author{Oliver Grimm, ETH Z\"{u}rich} \maketitle This report describes the design and basic functionality of the \E control system. This essentially is a C++ class and a set of programs running on Linux for controlling small scale experiments. It is based on CERN's DIM library for interprocess communication over TCP/IP connections. \lstinline{$Rev: 255 $} \tableofcontents % =================================================== \section{Overview of \E} \label{Overview} The \E control system has been developed for application in small-scale experiments, for which established and comprehensive control systems like EPICS, DOOCS or PVS-II are too large and impose too much overburden.\footnote{Information on these systems can be found at \url{http://www.aps.anl.gov/epics/}, \url{http://tesla.desy.de/doocs}, \url{http://www.etm.at/}.} The development of \E has been started within the FACT project (\emph{First G-APD Cherenkov Telescope} \cite{Bra09,And10}). An experiment control system often comprises several individual programs that require configuration information, produce data that should be stored and easily visualized, and at least partly need to exchange information between each other. The intention of \E is to a) integrate this with a minimum of extra coding on part of the applications, b) to achieve this using as far as reasonable established tools, c) to be centralized\footnote{This is a design choice that partly defines the range of applicable experiments. For large systems decentralization can be a better approach.}, and d) to be lean. The common functionality is collected in a small C++ class. Its main objective is to allow easy surveillance of the system by a central alarm server and aiding the programmer by providing a few standard routines. It supports easy message logging as well as the distribution of warning and error conditions by the individual programs, but otherwise only little definite structure is imposed on the programmer \E uses the DIM (Distributed Information Management) system\footnote{http://dim.web.cern.ch/dim} as communication layer \cite{Gas01}. This system is developed and actively maintained at CERN for more than 17 years. DIM itself can be compiled on many platforms, but \E is developed on Linux and uses several functions specific to this operating system. Sections \ref{Overview} and \ref{DIM-Basics} give a general overview of \E and its components, the remaining sections contain more technical details. \subsection{Main programs} The functionality of the main programs of the the \E control system\footnote{The denomination \emph{control system} for the core programs of \E is exaggerated, as without further servers that provide experiment-specific functionality, \E would only control itself. Lacking a better term, this report sticks to it.} are briefly listed here. More details, including all required configuration items and services provided, can be found in Sect.\,\ref{ServerDetails}. \begin{itemize} \item \lstinline{Config}\\ The configuration server distributes text data read from a configuration file to clients via remote procedure calls. The configuration file is watched for modifications and clients are informed on updates via a DIM service. \item \lstinline{DColl}\\ The data collector dynamically subscribes to all DIM services (except those explicitely excluded) and writes the corresponding data at every service update to a file. It also provides a central logging facility via a DIM command. \item \lstinline{Alarm}\\ The alarm server subscribes to the standardized message service of a given list of servers and monitors their severity. It generates a master alarm and can send email notifications in case of server warnings, errors or unavailability. An alarm has to be explicitely acknowledged through a DIM command to be reset. \item \lstinline{History}\\ The history server keeps a number of updates of all services in a memory ring buffer and can send the buffer to a client if requested through a DIM remote procedure call. This can be used by user interfaces to provide a quick visualization of the past behavior of a given service without accessing the data stored on disk. \item \lstinline{Bridge}\\ The bridge connects two DIM networks, repeating services, commands and remote procedure calls from one to the other. It allows commands only from a given set of IP addresses, thus providing some limited access control (but no authentication). \end{itemize} A graphical user interface has been developed for the FACT project, see Sect.\,\ref{User-Interface}. It consists of a set of Qt widgets adapted to display the contents of DIM services and history buffers, as well as sending commands. \section{Basic DIM functionality} \label{DIM-Basics} As DIM defines the basic operation principle of the control system, its functionality is briefly sketched here. Details explained in the DIM manual are not reported, but features and caveats not properly documented are summarized in Appendix \ref{DIMDetails}. DIM uses the client/server approach. DIM services are known to the user only by their name\footnote{The name is a standard C string.}, not by the location of the corresponding server on the network. A central name server brings clients and servers in contact. If a client wants to subscribe to a particular service, he first contacts the name server and then establishes, using the obtained information, a direct TCP/IP connection to the server. The name server is then not involved further and could even go down without affecting the established connections. There is no access control implemented within DIM. Authentication should be done via other methods that allow to restrict access to IP addresses and ports. An example configuration for the Linux \lstinline{IPtables} package is given in Appendix \ref{FirewallExample}. \subsection{Server and client} A server provides, via unique names, services, commands and remote procedure calls. A DIM service contains data that a client can subscribe to, a DIM command is a one way transfer of data from a client to a server\footnote{The client can optionally be informed if the command arrived at the server, but not if it was actually processed.}, and a DIM remote procedure call in addition will sent back data to the client. Subscription in this context means that the client can be informed automatically if updated data becomes available or if the service became unavailable. A client can also choose to be updated at regular intervals.\footnote{Also in this case the update is initiated by the server, to which the requested periodic update rate is send.} Subscribing to a not yet existing service is possible. The connection will then be established automatically as soon as it becomes available. A dedicated TCP/IP connection is established between each server and client.\footnote{That also means that if 10 clients subscribe to a particular service, the data will be send 10 times over the network. This eventually will limit the scalability of the system.} A server takes at startup the first free port in the range between 5100 and 6000 to respond to clients. The overhead imposed by DIM on a TCP/IP connection is small. The throughput can almost reach the line speed, but depends to some extend on the amount of data transmitted per service update. A non-negligible load is generated at service reception by the thread-based approach of DIM as data arrival normally triggers the execution of a handler in a separate thread. This requires a context switch that constrains even on fast computers the rate of updates and, for small service sizes, the throughput. \subsection{Name server} The name server keeps a list of all services, commands and remote procedure calls available in the system and their corresponding server addresses. It distributes this information to clients wishing to make a connection. This is done by the DIM library in the background and requires no user action. The information is also accessible to a client via a DIM service (see Appendix \ref{ServiceFormats}). \subsection{Implementation in an application} Implementing the DIM server functionality requires only little additional coding in an existing program, typically a few lines of code and inclusion of the DIM library. The library handles all communication transparently in the background (see Sect.\,\ref{Implementation} for more details). DIM provides C++, C, Fortran, Java and Python interfaces, and can be compiled for various variants of Linux and Windows on 32 and 64 bit architectures. At several places, handlers (also called call-back routines) can be implemented by the user application that are then invoked automatically by the DIM library when a request from a client or an answer from a server arrives. These handlers are executed in a separate thread, and care has to be taken by the programmer in accessing data structures from a handler. All calls to handlers are strictly serialized by DIM, thus an access protection mechanism is not needed if data is only accessed from within these handlers. \section{Detailed description of the \E class} In principle, there is no need to use a particular collection of functions beyond those defined by DIM itself to interconnect programs with DIM. A certain number of tasks, however, are repetitive, and some standardization is beneficial for supervising the server functionality. The C++ class \lstinline{EvidenceServer} is used by all the main programs of \E to provide the common functionality that is needed for all of them. It is suggested that also user applications use this class. \lstinline{EvidenceServer} can simply be inherited by the user class, as shown in Sect.\,\ref{Server-Programming}. There is no corresponding \lstinline{EvidenceClient} class. Those methods that can be used also by a client are declared \lstinline{static} in \lstinline{EvidenceServer} and can thus be invoked without instantiation. \subsection{Class functionality} \begin{itemize} \item Starts the DIM server. \item Provides a standard text message service \lstinline{SvrName/Message}, including encoding of the message severity (INFO, WARN, ERROR, FATAL) and automatic logging of it. The initial message published (and thus registered by the data collector) contains the subversion revision number and built time of the server. The DIM service format is \lstinline{"I:1;C"}. \item Provides a method for configuration requests. If the configuration data is not available, the application terminates with a message of FATAL severity unless default data is given. \item Provides a method for safely translating DIM service data into text. \item Implements the virtual DIM methods \lstinline{exitHandler()}. It can be called through a standard DIM command \lstinline{SvrName/EXIT}, taking a single integer as argument. Upon first invocation, the handler just sets the flag \lstinline{ExitRequest} which should be handled by the application. Upon second invocation, it will call \lstinline{exit()}. A special functionality is given to the argument value 0: it instructs the server to reset its message severity to INFO, without exiting. This is used by the \lstinline{Alarm} server if it receives a command to reset an alarm level, but is also available to the user. The user application can override this handler. \item Implements the virtual DIM methods \lstinline{errorHandler()}. The error handler will issue a message with ERROR severity that contains the DIM error code. The user application can override this handler. \item Installs signal handler for SIGQUIT (ctrl-backspace), SIGTERM, SIGINT (ctrl-c), and SIGHUP (terminal closed). The signal handler sets first \lstinline{ExitRequest}, and on second invocation calls \lstinline{exit()}. After instantiating the class, the programmer may override the handlers. \item Catches un-handled C++ exceptions and extracts as much information from the exception as possible.\footnote{This termination handler is taken directly from the code of the \lstinline{g++} compiler and is thus compiler specific.} That information is also published as a message. \item Subscribes to the service \lstinline{Config/ModifyTime}. Upon updates to the configuration file, a call-back routine of the user application can be automatically invoked, see Sect.\,\ref{ConfigHandling} for more details. \end{itemize} \subsection{\lstinline{public} class methods} \label{EvidenceServer-Methods} The \lstinline{public} part of the header file \lstinline{Evidence.h} is as follows. The namespace designation \lstinline{std} has been left out for clarity in this listing. \begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt] #define NO_LINK (char *) "__&DIM&NOLINK&__" class EvidenceServer: public DimServer { public: EvidenceServer(string); ~EvidenceServer(); enum MessageType {INFO=0, WARN=1, ERROR=2, FATAL=3}; void Message(MessageType, const char *, ...); void SendToLog(const char *, ...); string GetConfig(string, string = string()); void ActivateSignal(int); void Lock(); void Unlock(); static string ToString(char *, void *, int); static bool ServiceOK(DimInfo *); static bool ServiceOK(DimRpcInfo *); static vector Tokenize(const string &, const string & = " "); bool ExitRequest; }; \end{lstlisting} The class methods are thread safe as they either use only local data or lock access if necessary. \lstinline{NO_LINK} is used for service subscription by clients, see the method \lstinline{ServiceOK()} below and Sect.\,\ref{Client-Programming} The constructor \underline{\lstinline{EvidenceServer()}} takes the server name as argument which is subsequently automatically added to logging and message texts and also used for configuration requests with \lstinline{GetConfig()}. \underline{\lstinline{Message()}} updates the standard message service with the given text and severity. Formatting is as for \lstinline{printf()}. The text is also sent to the console and to the log file with \lstinline{SendToLog()}. In case of FATAL severity \lstinline{exit()} is invoked, so that the application can safely assume the call will not return. The permanent buffer for the DIM service is automatically allocated and freed. \underline{\lstinline{SendToLog()}} sends the text to the central log file via a non-blocking command. That method can be called also in a termination or crash handler. \underline{\lstinline{GetConfig()}} issues, on first invocation, a DIM remote procedure call to the configuration server to retrieve the required data and returns it as a string. The second argument gives the data to be returned in case the server is unavailable or cannot provide the requested data. If in this case the second string is empty, the program terminates with a FATAL message. Using the service \lstinline{Config/ModifyTime}, the server keeps track of changes to the configuration file in the background. Upon subsequent requests for the same configuration data, it only issues a remote procedure call again if the file changed in the meantime. If not, the same data already retrieved is returned. This way, this function can be repeatedly called, even at high rate, without generating unnecessary load to the configuration server (as the configuration file does not change frequently). \underline{\lstinline{ActivateSignal()}} is used to define a signal that should be emitted to the main thread in case the configuration file changes. See Sect.\,\ref{ConfigHandling} for details. No signal will be emitted if not set by this routine. The methods \underline{\lstinline{Lock()}} and \underline{\lstinline{Unlock()}} work on an internal mutex.\footnote{Its type is \lstinline{PTHREAD_MUTEX_ERRORCHECK}. In case an already locked mutex is re-locked, the corresponding system call will therefore return a error and thus avoid dead-locking. Error messages from \lstinline{Lock()} and \lstinline{Unlock()} are written to the console and to the log file. They are not published using \lstinline{Message()} since this method itself uses locking and calling it would result in an infinite recursion.} They are used by \lstinline{GetConfig()} but are also available for the user application to serialize access from multiple threads. If a signal is set by \lstinline{ActivateSignal()}, it is masked before locking and unmasked after unlocking. Calling functions in the locked state should be avoided as it might result in re-locking. The static method \underline{\lstinline{ToString()}} translates the contents of a DIM service safely into a string that is returned. As no consistency between a service format and the contained data is guaranteed by DIM, precautions are necessary to avoid buffer overruns. The method currently handles the standardized message format \lstinline{"I:1;C"}, arrays of numbers and strings. All other formats are translated into a hex representation. The arguments are the DIM service format, a pointer to the service data and the data size in bytes. It is thread safe as it uses only the arguments and dynamically allocated storage. The two variants of the static method \underline{\lstinline{ServiceOK()}} take a pointer to a received service update or result of a remote procedure call (as available in the respective handlers) and safely checks if its contents is identical to the constant \lstinline{NO_LINK}. If so, they return false. If using the same constant in the service declaration, this provides a safe way of being informed if a particular service becomes unavailable. Then, the handler is called once for that service with the data content \lstinline{NO_LINK}. \underline{\lstinline{Tokenize()}} takes the string from the first argument, tokenizes it using the characters contained the second argument as delimeters, and returns a vector of strings containing all tokens. The boolean \underline{\lstinline{ExitRequest}} is set to \lstinline{true} by the signal handler when the program should terminate. The application should check that variable and react accordingly.\footnote{The reception of a signal usually makes system calls return with an error \lstinline{EINTR}. That behaviour can be used by the application to honour \lstinline{ExitRequest} without continuous polling (e.g by using the \lstinline{pause()} system call).} \subsection{Note on program termination} If the application does not react to \lstinline{ExitRequest} or an immediate program termination is required, several methods may invoke the \lstinline{exit()} system call. Orderly termination is then still possible in most cases if the application uses \lstinline{atexit()} to register a termination function that will be called by the operating system during execution of \lstinline{exit()}. That function can the do clean up work. If a class instance is declared \lstinline{static}, its destructor will also be called by \lstinline{exit()}. However, due to the nature of an exception or a signal resulting from an error condition, correct execution of the termination routines cannot always be guaranteed. \section{Main servers of \E} \label{ServerDetails} Starting a server requires that the environment variable \lstinline{DIM_DNS_NODE} contains the Internet address of the name server. Optionally, \lstinline{DIM_DNS_PORT} may be set if the name server answers on a port different from the standard 2505. \subsection{\lstinline{Config} --- Distribution of configuration information} The configuration server accesses a text file, formatted in the INI style, and responds to requests for configuration data from a client. To this end, it provides a remote procedure call with the name \lstinline{ConfigRequest}. The data send along by the client is interpreted as a C string in the format \lstinline{SVR_NAME ITEM}. The configuration server searches for a section \lstinline{SERVER_NAME} and then for a line starting with \lstinline{ITEM =} in the configuration file. It then return the text following the equal sign up to the next item line as a string, removing all leading, trailing and multiple white space and all comments. The file containing the configuration data is watched for modification using the Linux \lstinline{inotify} mechanism, so always the latest data is distributed. For this purpose, the file is also accessed with all buffering disabled. The configuration file is given as command-line option at start-up of the server. Usually, this should be the first server to be started. The configuration file format is illustrated here with an example from FACT. \begin{lstlisting} [SQM] # Sky Quality Monitor address = sqm.ethz.ch port = 10001 period = 30 [DColl] # Central Data Collector exclude = DIS_DNS/SERVER_INFO Alarm/Summary Bias/ConsoleOut drsdaq/Count drsdaq/EventData drsdaq/ConsoleOut sizeupdate = 30 # Min delay in seconds between file size updates rollover = 12 # Hour of day for change of date \end{lstlisting} \noindent \begin{longtable}{lp{0.7\textwidth}} \multicolumn{2}{l}{\textbf{Invocation}} \\ \multicolumn{2}{l}{\lstinline|Config |} \\[1ex] \multicolumn{2}{l}{\textbf{Remote procedure call}} \\ \lstinline|ConfigRequest| & Interpret data send along as C string in the form \lstinline|SERVER_NAME ITEM| and return the applicable configuration data as text or an empty response if data could not be found.\\[1ex] \multicolumn{2}{l}{\textbf{Services}} \\ \lstinline|Config/ConfigData| & Contains the full text of the configuration file. If this service is not excluded in the data collector, the latest version is automatically written to the slow data stream at every update.\\ \lstinline|Config/ModifyTime| & Contains the unix time of the last modification of the configuration file. \end{longtable} \subsection{\lstinline{DColl} --- Data collector} The data collector subscribes to all DIM services (except those excluded by the configuration item \lstinline{exclude}) and writes at every update of a service the data to a text file. A new file is generated daily, written to a directory that changes yearly. It translates the service data to text using the method \lstinline{ToText()} from the \lstinline{EvidenceServer} class and then replaces all non printable characters by spaces. This server provides a command to log information in a single log file. To copy or truncate this file, standard Linux tools like \lstinline{logrotate} can be used. Using a non-blocking DIM command data can be send for logging even as part of a crash handler, aiding in debugging. All files are opened in append mode, thus preventing overwriting of existing data. \noindent \begin{longtable}{lp{0.7\textwidth}} \multicolumn{2}{l}{\textbf{Configuration section \lstinline|[DColl]|}} \\ \lstinline|exclude| & Services not to write to the data file. Regular expressions can be used.\\ \lstinline|basedir| & Directory where to root the data file structure and where the log file resides.\\ \lstinline|sizeupdate| & Minimum delay in seconds between updates to the file size services. Updates will never occur more frequently than once per second.\\ \lstinline|rollover| & Hour of day in local time when to start a new data file.\\[1ex] \multicolumn{2}{l}{\textbf{Commands}} \\ \lstinline|DColl/Log Text| & Interprets \lstinline|Text| as a C string and writes it to the log file, including information on the sender and a time stamp.\\[1ex] \multicolumn{2}{l}{\textbf{Services}} \\ \lstinline|DColl/DataSizeMB| & Size of current data file in MByte.\\ \lstinline|DColl/CurrentFile| & Name of current data file.\\ \lstinline|DColl/LogSizeMB| & Size of log file in MByte.\\ \end{longtable} \subsection{\lstinline{Alarm} --- Handling of error conditions} The alarm server maintains a list of \emph{alarm levels} for a given set of servers. The alarm levels are defined as \lstinline{OK} (0), \lstinline{WARN} (1), \lstinline{ERROR} (2), \lstinline{FATAL} (3), and \lstinline{UNAVAILABLE} (4). The first four result from the corresponding severities of the message services, to which the alarm server subscribes. The alarm level does not decrease if, for example, a server issues a message with severity \lstinline{WARN} after one with \lstinline{ERROR}. It is only reset by command or by restarting the alarm server. A master alarm is generated from the highest server alarm level. The alarm server also periodically checks if all required servers are up (searching for them with the DIM browser). It can send an email in case a server is down or in error. One email will be send with each increase of alarm level for each server. The alarm server itself could be monitored, if desired, using a Linux watch dog and/or from a remote operators panel. \noindent \begin{longtable}{lp{0.7\textwidth}} \multicolumn{2}{l}{\textbf{Configuration section \lstinline|[Alarm]|}} \\ \lstinline|servers| & List of servers to check. An email address can be added to a server name by colon.\\ \lstinline|period| & Interval in seconds to check for server availability.\\[1ex] \multicolumn{2}{l}{\textbf{Commands}} \\ \lstinline|ResetAlarm xyz| & Reset alarm level of server \lstinline|xyz|.\\[1ex] \multicolumn{2}{l}{\textbf{Services}} \\ \lstinline|Alarm/Summary| & Text listing all observed servers and their alarm level.\\ \lstinline|Alarm/MasterAlarm| & The highest alarm level of all servers watched.\\ \lstinline|xyz/AlarmLevel| & Highest alarm level of server \lstinline|xyz| since the start of the \lstinline|Alarm| server or the last reset command. \end{longtable} \subsection{\lstinline{History} --- Service histories} Data written by \lstinline{DColl} usually resides on a hard disk and is thus not quickly accessible to remote clients. A recently started user interfaces, for example, might want to provide a display to show the past behaviour of a DIM service. The \lstinline{History} server facilitates this by subscribing to all services and keeping their recent updates in a ring buffer in memory. The ring buffer entries are time stamped and contain exactly the data that was contained in the DIM service. The server provides a remote procedure call to retrieve that data. The ring buffers are written to files in case the history server is terminated, and re-read at next startup. The directory where the buffers are stored is given as command line parameter. The ring buffer implementation needs to store entries of variable size in a continuous memory region to allow transmission with DIM. This requires pointer manipulations that are more error prone than code using only standard C++ containers. This is one reason why the history functionality, which is not an essential component of the control system but a convenience function, is separated from the data collector. A request for a history buffer also implies a non-negligible transfer of data, thus if there are bandwidth issues, history servers can be installed at several places along a network chain that are separated by a bridge, see below. \noindent \begin{longtable}{lp{0.7\textwidth}} \multicolumn{2}{l}{\textbf{Invocation}} \\ \multicolumn{2}{l}{\lstinline|History |} \\[1ex] \multicolumn{2}{l}{\textbf{Configuration section \lstinline|[History]|}} \\ \lstinline|minchange| & Minimum absolute change necessary for a service to be added to the history buffer. The format is \lstinline|ServiceName:MinChange|. This is only meaningful for services that represent numbers or number arrays. For an array, the difference of the sum of the absolute values of all elements is compared to \lstinline|MinChange|.\\ \lstinline|maxsize_kb| & Maximum size of a single history buffer in kByte. Default value is 2000.\\ \lstinline|numentries| & Numer of entries that a history buffer should hold, provided its size does not exceed the defined maximum. Default value is 1000. For DIM services of varying size, buffer sizes are recalculated at each update and never shrink.\\[1ex] \multicolumn{2}{l}{\textbf{Remote procedure call}} \\ \lstinline|ServiceHistory Srvc| & Returns the history buffer of the given service if available, otherwise the response will be empty (zero bytes). If the buffer is not currently in memory because the corresponding service is not available, it will be searched for on disk. \end{longtable} To safely retrieve the data from a history buffer, a class \lstinline{EvidenceHistory} is available. Its \lstinline{public} part is as follows. \begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt] class EvidenceHistory { public: struct Item { int Time; int Size; char Data[]; // Size bytes follow } __attribute__((packed)); EvidenceHistory(std::string); ~EvidenceHistory(); bool GetHistory(); char *GetFormat(); const struct Item *Next(); void Rewind(); }; \end{lstlisting} The constructor takes as argument the name of a DIM service. Calling \lstinline{GetHistory()} will request a history buffer from the server and returns \lstinline{true} if successful. A blocking remote procedure call is used. \lstinline{Next()} will then iterate through the entries of the history buffer, starting at the oldest entry, and returns a pointer to a \lstinline{struct Item}, through which the time, size and data of the entry can be accessed. If no more data is in the buffer, \lstinline{NULL} is returned. To read the buffer again, \lstinline{Rewind()} can be called. The DIM service format is returned by \lstinline{GetFormat()}. The structure attribute ensures that no padding bytes are added by the compiler. That is directive specific to the \lstinline{g++} compiler. Accessing the history buffer through this class is recommended, as the format of the buffer might change at a later time. \subsection{\lstinline{Bridge} --- Connecting two DIM networks} The \lstinline{Bridge} server appears as a client on the primary DIM network (IP address and optionally port of the primary name server are given as command-line arguments), subscribes to all services not excluded in the configuration and forwards them to the secondary network where it acts as server (secondary name server given as usual by the environment variable \lstinline{DIM_DNS_NODE}). Services originating from various servers on the primary side will appear to be provided by the \lstinline{Bridge} server on the secondary side. The name, time stamp and service quality number are unchanged. The bridge also creates commands with the same name as seen on the primary side and forwards them to the actual servers. They are send non-blocking. Confirmation of reception at a client on the secondary side only indicates that the bridge has received the command. Remote procedure calls are internally handled by DIM as a command/service pair, thus no special handling for this case is required. All service updates and commands pass through the bridge server, thus it should run on a sufficiently powerful computer. An application of the bridge is to facilitate access control: a firewall between the two DIM networks can prohibit any connection except by the bridge where, for example, the origin of commands can be checked. The bridge can also help if the server load due to too many clients would become too high. A bridge between the server and the clients will allow to have only one direct connection to the server by the bridge, while all other clients only load the bridge. \noindent \begin{longtable}{lp{0.7\textwidth}} \multicolumn{2}{l}{\textbf{Invocation}} \\ \multicolumn{2}{l}{\lstinline|Bridge [Primary name server port]|} \\[1ex] \multicolumn{2}{l}{\textbf{Configuration section \lstinline|[Bridge]|}} \\ \lstinline|cmdallow| & IP addresses that are allowed to pass a command over the bridge. All other commands will be rejected. An \lstinline|INFO| entry in the message service will be made in this case. The address is compared to the result of the DimServer:: getClientName() call.\\[1ex] \lstinline|exclude| & Services not to bridge. This must always contain \lstinline|DIS_DNS/*| as a DIM name server exists on both sides and service names must be unique.\footnotemark It can also be desirable to not forward History services, and run instead a separate history server on the secondary side. Regular expressions are allowed. \end{longtable} \footnotetext{If needed, it would be easy to implement a renaming of services on the secondary side.} \section{Details on handling configuration updates} \label{ConfigHandling} An application typically requests configuration data at start-up and then uses this data repeatedly during its execution flow. The configuration file can be modified during that time and it might not always be clear to the user that the program still uses outdated information. A method to keep the application up-to-date with respect to its configuration data is thus desirable. \subsection{\lstinline{GetConfig()}} As a first step in achieving this, the application should not store the obtained configuration data internally, but always re-request it using the method \lstinline{GetConfig()} described in Sect.\,\ref{EvidenceServer-Methods}. This method will only issue a remote procedure call to the \lstinline{Config} server if the configuration file has been modified since the last invocation. So calling this method even at high rate will not load the configuration server at all if the configuraton file is unchanged, but will yield up-to-date information if it did change. The remote procedure call is blocking when called from the main thread, and non-blocking, using an \lstinline{rpcInfoHandler()}, when called from any other thread (especially also from the DIM handler thread). Blocking execution means that the remote procedure call will wait until the data has arrived from the server before returning to the application, whereas non-blocking execution will return immediately and invoke a handler later when the data arrived. This procedure is necessary since a blocking remote procedure call from \lstinline{infoHandler()} will result in a dead-lock. In the non-blocking case, the call to \lstinline{GetConfig()} returns still the previous, non-updated data even if the configuration file changed. The result of the non-blocking remote procedure call can only be processed by DIM once the current and all queued handler invocations have finished. When this is done, updated data will be returned by subsequent calls to \lstinline{GetConfig()}. \subsection{\lstinline{ConfigChanged()}} An alternative, albeit for the programmer more demanding, procedure for semi-automatic updates on configuration information is to implement the method \lstinline{ConfigChanged()} in the user class. This method can be invoked through a signaling mechanism by the \lstinline{EvidenceServer} class. To this end, first \lstinline{ActivateSignal()} has to be called with a signal number that should be used for announcing configuration changes (for example, \lstinline{SIGUSR1}). That signal is send to the main thread when the service \lstinline{Config/ModifyTime} changes. The signal is caught by the internal signal handler of the \lstinline{EvidenceServer} class which in turn calls the method \lstinline{ConfigChanged()}. It is declared as \lstinline{virtual} in the class and defined as an empty function. The user application can override this by declaring and defining the method itself. This method is then executed in the main thread and thus \lstinline{GetConfig()} will use blocking connections to get always up-to-date data. However, it must be kept in mind that this routine is running as part of the signal handler. The normal main thread execution was interrupted at an arbitrary point by the signal, thus suitable protection must be employed by the programmer when accessing common data structures. To ease that, the \lstinline{EvidenceServer} class contains the pair of methods \lstinline{Lock()} and \lstinline{Unlock()} that work on an class internal mutex. When the mutex is acquired, also the signal declared by \lstinline{ActivateSignal()} is disabled, thus preventing interruptions in case a configuration update occurs in the locked state. The mutex type is \lstinline{PTHREAD_MUTEX_ERRORCHECK} and therefore includes error checking: no dead-lock will occur if double locking, but the program will terminate with a \lstinline{FATAL} message. \section{Implementing and compiling \E} \label{Implementation} \subsection{Makefile} If the environment variable \lstinline{DIM_DIR} is pointing to the DIM installation, a \lstinline{make} file for a program \lstinline{MyProg} can be as follows. \begin{verbatim} CC=g++ PROG=MyProg CPPFLAGS += -I$(DIMDIR)/dim/ LDLIBS += -lpthread $(DIMDIR)/linux/libdim.a all: $(PROG) $(PROG): $(PROG).o Evidence.o \end{verbatim} Instead of the static version, a shared library \lstinline{libdim.so} is also available. The \lstinline{EvidenceServer} class is linked (and compiled, if not done yet) upon generation of \lstinline{MyProg}. It should reside in the same directory as \lstinline{MyProg.cc}, or else the correct path to \lstinline{Evidence.o} should be added . \subsection{Server} \label{Server-Programming} For a server, the following skeleton can be used. \begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt] #include "Evidence.h" // Class declaration class MyClass: public EvidenceServer { ... } // Constructor MyClass::MyClass(): EvidenceServer(SERVER_NAME) { ... } // Request configuration data std::string Data = GetConfig("config_item_name"); // Create service static int Service = 0; NewService = new DimService(SERVER_NAME "/ServiceName", Service); // Remove service delete NewService; \end{lstlisting} Note that the contents of the service might be requested by a client at any time. As the service variable is passed by reference, it is mandatory that the lifetime of that variable is as long as that of the DIM service. \subsection{Client} \label{Client-Programming} A client can use the same header file and access the static methods and the constant \lstinline{NO_LINK} defined without instantiating the server class. \begin{lstlisting}[numbers=left,numberstyle=\tiny,stepnumber=2,numbersep=5pt] #include "Evidence.h" // Class declaration class MyClass: public DimClient { ... } // Subscribe to service using infoHandler() DataItem = new DimStampedInfo(ServiceName, NO_LINK, this); void MyClass::infoHandler() { // Check if service became unavailable if (!ServiceOK(getInfo())) { ... } ... } \end{lstlisting} The various versions of \lstinline{DimInfo} are described in the DIM manual. At subscription and then at every update, the method \lstinline{infoHandler()} is executed. If the method \lstinline{ServiceOK()} returns false, the corresponding service became unavailable, otherwise the service data can be evaluated within the handler. The handler applies to all service subscriptions made within this class. \section{User interface} \label{User-Interface} A graphical user interface (GUI), implemented using the Qt and Qwt frameworks\footnote{Information on these frameworks is available at \url{http://qt.nokia.com/} and \url{http://qwt.sourceforge.net/}.}, is available. It derives from standard widget classes extended versions that can display the contents of DIM services and history buffers. A widget to send generic text commands is also available. Qwt is used to display graphs which is not supported by Qt. The GUI is called \emph{Evidence Data Display} (\lstinline{EDD}). It has a single point interface to the DIM system and distributes received service updates to its widgets using the Qt signal/slot mechanism. This is necessary since the DIM \lstinline{infoHandler()} receiving the updates runs in a separate thread, but manipulations of GUI elements within Qt may only be done by the main thread. This mechanism also guarantees that one GUI instance subscribes not more than once to a particular service, even if the same data is shown by multiple widgets. The GUI implementation is designed to be easily portable and does not use operating-system specifics. It sticks to standard C++ code, to Qt capabilities and to DIM. Qt is explicitely designed for cross-platform applications. \section*{Acknowledgments} \addcontentsline{toc}{section}{Acknowledgments} The help of Clara Gaspar from CERN was indispensable for understanding the DIM system and making good use of it. Manwoo Lee from Kyungpook National University kindly checked the main programs for bugs and missing error handling. \begin{appendix} \section{DIM features not documented in the manual} \label{DIMDetails} \subsection{Format of \lstinline{DIM_DNS/SERVER_LIST} and \lstinline{xyz/SERVICE_LIST}} \label{ServiceFormats} The name server \lstinline{DIS_DNS} provides a service \lstinline{DIS_DNS/SERVER_LIST} containing a C string. Subscribing to it, the client gets at first update a list of all currently existing servers in the format \lstinline{Servername@node}. Individual entries are separated by the \lstinline{|} character. Further updates will indicate additions (\lstinline{+}), deletions (\lstinline{-}) or error states (\lstinline{!}) of servers, with a list of the same format as before following. The error state indicates that the sever did not send its regular watchdog message. It can be treated as deleted by the client, as it will reappear with \lstinline{+} in case it sends the messages again. Each server \lstinline{xyz} has a service \lstinline{xyz/SERVICE_LIST}, also a C string. It contains line-feed separated entries listing all services, commands and remote procedure calls available from it. For a service, the entry is \lstinline{Name|Format|}, for a command \lstinline{Name|Format|CMD} (\lstinline{Format} may also be empty), and for a remote procedure call \lstinline{Name|Format_in,Format_out|RPC}. Additions and deletion of services are handled as above. Sequentially subscribing to \lstinline{DIS_DNS/SERVER_LIST} and then to all corresponding service lists allows a client to be kept up-to-date of all existing services, commands and remote procedure calls within the system. This mechanism ist used by the components \lstinline{DColl}, \lstinline{History} and \lstinline{Bridge} of \E. The \lstinline{DimBrowser} class is provided by DIM with similar functionality. \subsection{Miscellaneous} A standard service \lstinline{xyz/CLIENT_LIST} is provided by all DIM servers, with a structure similar to \lstinline{xyz/SERVICE_LIST}. If a command could not be delivered to the server it is discarded and not delivered later. This prevents the reception of spurious, delayed commands. In case the blocking version of \lstinline{sendCommand()} was used by the client, it receives in this case the return value 0. If the received value in the \lstinline{infoHandler()} indicates the service is unavailable (\lstinline{NO_LINK}), then \lstinline{getFormat()} should not be used as its result may point to an arbitrary memory location. For updating a service, the same version of the function as used for its creation must be used, otherwise the call will be ignored. If a service is created using \lstinline{Service = DimService(char *Name, char *Format, void *Data, int Size)}, then it is not possible to use \lstinline{Service->update(char* Newdata)}, even if the original format was \lstinline{"C"}. The data received from a service or a remote procedure call is not guaranteed by DIM to comply to the format of that service (e.g. format \lstinline{"C"} does not assure a \lstinline{'\0'} terminated C string). The data content is up to the sender. If in doubt, the receiver must make sure it does not access more than \lstinline{getSize()} bytes starting at the memory location given by \lstinline{getData()}. Function like \lstinline{getInt()} or \lstinline{getString()} are only casting the pointer \lstinline{getData()}. The main purpose of the format identifier of DIM services is to allow DIM to translate structure padding from the server architecture to the client. A format like \lstinline{"C:3;I:2;F"} is intended to correspond to a variable sized structure in C, \begin{verbatim} struct A { char a[3]; int b[2]; float c[]; } \end{verbatim} If server and client use the same structure definition, a cast like \lstinline{struct A *Pnt = getData()} will guarantee correct access to elements. For example, \lstinline{Pnt->b[1]} will contain the second integer, even if client and server pad structures differently. Padding can also be disabled for a server if desired. In general, no blocking functions should be called from within a DIM handler. Specifically, making a blocking remote procedure call in an \lstinline{infoHandler()} will dead lock.\footnote{Non-blocking reception using an \lstinline{rpcInfoHandler()} is possible}. If DIM is compiled without threads, it uses internally the signals SIGIO and SIGALRM for communication. Only a single DIM server can be started by a process. Even when publishing to two different DIM networks, the service names on both sides combined have to be unique (since internally, a single hash table is used). \section{Example for access control using IPtables} \label{FirewallExample} This is an example for an IPtables firewall setting taken from one of the FACT computers. Rule 4 is to allow \lstinline{ssh} connections, rule 5 is for the DIM servers, and rule 6 for \lstinline{X11} connections. Rule 2 is an example how direct connections from a particular IP address can be prohibited. That can be used to force all connection to go through the Bridge. \footnotesize \begin{verbatim} Chain RH-Firewall-1-INPUT (2 references) num target prot opt source destination 1 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 255 2 REJECT all -- 192.33.97.201 0.0.0.0/0 reject-with icmp-port-unreachable 3 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 4 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 5 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpts:5100:6000 state NEW 6 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpts:6000:6063 state NEW 7 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited \end{verbatim} \normalsize Rule 2 prohibits connections to servers running on this computer. If the name server would be at another IP address, a client could still see the existence of the local servers and access also the list of services they provide, but connection attempts would be denied. %\section{Glossary} %DIM service %subversion (svn) %thread %widget \end{appendix} % =================================================== \begin{thebibliography}{xxxx00} \bibitem[Gas01]{Gas01} C. Gaspar, M. D\"{o}nszelmann and Ph. Charpentier, \emph{DIM, a portable, light weight package for information publishing, data transfer and inter-process communication}, Computer Physics Communications 140 1+2 102-9, 2001 \bibitem[Bra09]{Bra09} I. Braun et al., Nucl. Inst. and Methods A 610, 400 (2009) \bibitem[And10]{And10} H. Anderhub et al., Nucl. Inst. and Methods, to be published (2010) \end{thebibliography} \end{document}