Version 11 (modified by 6 years ago) ( diff ) | ,
---|
Connecting to the Database
Host
The database is hosted at ihp-pc45.ethz.ch
User
First, you need a user. For the moment, a user 'fact' is available with the standard password. The user 'fact' is allowed to connect from everywhere if an encrypted connection is used. Usually, all reasonably recent mysql clients are using encrypted connections. So in most cases, a simple should be enough
> mysql -C -h ihp-pc45.ethz.ch -u fact -p factdata
To enforce encryption, --ssl (oder clients) or --ssl-mode=REQUIRED can be used. If you have problems with the connection, you can also try --protocol=TCP.
If you access the database from outside of ETH, it is wise to enable compression with the -C option. Inside ETH (in particular on ihp-pc45), enabling -C is certainly a performance drawback and should be avoided.
Note that the mysql client libraries at ISDC are too old and do not allow for encrypted connections. Thus no connection from ISDC is possible without tunnel. How to tunnel your connection is explained in the following. Note that it requires an account on ihp-pc45 (which I think should not be generally available). Thus this is mainly meant as a solution for automatic processes running at ISDC, for example, to update the database.
Forward Tunnel
If you are logged in at ISDC as 'user' and you have an account 'ethz' at ihp-pc45, you can use a tunnel. To setup a tunnel use
ISDC> ssh -x -C -n -N -q -L 10000:localhost:3306 ethz@ihp-pc45.ethz.ch
(It is wise to enable compression of the connection with the -C option)
Note that after log-in this process seems to stall (nothing happens anymore). This is correct. The tunnel is open. It will forward the local port 10000 from the ISDC machine to the port 3306 on a machine which is accessible as 'locahost' from ihp-pc45.
The mysql call would now look like
> mysql -h 127.0.0.1 -P 10000 -u fact -p factdata
Note that you need to use the IP address instead of localhost, otherwise the mysql client tries to use a socket connection (which will fail). You could also use --protocol=TCP.
As the mysql connection now comes via the loopback interface and not via the external IP, the connection of the mysql client is allowed to be unencrypted.
Backward Tunnel
Assume that you are already logged into ihp-pc45.ethz.ch and want to execute a mysql at ISDC accessing ihp-pc45, a backward tunnel can be used:
ihp-pc45> ssh -x -C -n -N -q -R 10000:localhost:3306 user@isdc-nx.isdc.unige.ch
(It is wise to enable compression of the connection with the -C option)
This command will log you into isdc-nx and (in parallel) create a tunnel from port 10000 at isdc-nx to port 3306 of a machine which is called 'localhost' from where you started the ssh connection (ihp-pc45).
You can now do
ISDC> mysql -h localhost -P 10000 -u fact -p factdata
Note that you need to use the IP address instead of localhost, otherwise the mysql client tries to use a socket connection (which will fail). You could also use --protocol=TCP.
As the mysql connection now comes via the loopback interface and not via the external IP, the connection of the mysql client is allowed to be unencrypted.
rootifysql
A convenient way to retrieve data is the rootifysql tool which is part of the FACT++ package. More details can be found either calling it with the --help option or at https://www.fact-project.org/logbook/showthread.php?tid=4192. The same access rules as if the native mysql client is used apply.
To simplify the usage, it is wise to write a rootifysql.rc file with the following contents:
uri=fact:password@iph-pc45.ethz.ch/factdata
The following tutorial assumes that such a file exists.
Other alternatives
Many possibilities exist to access a mysql database as a C API, MySQL++, Python (MySQL.Connector), PHP and others. You are free to use whatever tool you like. In the following, an analysis will be outlined using the rootifysql client and because it is most convenient.
PhpMyAdmin
To get a fast glimpse on the accessible databases and tables, you can log-in to PhpMyAdmin at http://iph-pc45.ethz.ch/phpmyadmin
The Analyis
Data Selection
For data selection only run-wise information should be relevant which are stored in the table RunInfo. The reason is that if you select data on a more fine grained level (e.g. event-wise zenith angle), right now there is no easy method to determine the corresponding observation time. So whenever data is selected event-wise make sure that you do not cut the data in a variable which cuts out events systematically and not randomly. For example, an event-wise cut in zenith angel usually keeps or discards two consecutive events because their zenith angle is correlated. For a cut in any image parameter (Width, Length, Size, ...), the result on two consecutive events is random because their image parameters are not correlated.
As an example we analyse the Crab data from our public sample (01/11/2013 - 06/11/2013).
Let's first have a look at the total observation time of all Crab data in this period:
SELECT COUNT(*), SUM(TIME_TO_SEC(TIMEDIFF(fRunStop,fRunStart))*fEffectiveOn/3600) AS EffOnTime, MIN(fZenithDistanceMin) AS MinZd, MAX(fZenithDistanceMax) AS MaxZd, MIN(fR750Cor/fR750Ref) AS MinQ, MAX(fR750Cor/fR750Ref) AS MaxQ FROM RunInfo WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000
The result (in mysql) is
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | MinZd | MaxZd | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 435 | 32.53992354 | 6.36 | 67.89 | 0.01477 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.01 sec)
So we have 435 data runs from Crab with a total effective observation time of 32.5 hours in a zenith angle range between 6° and 68° and a bad weather factor between 0.01 (really bad) to 1.1 (extremely good).
Taking only good data by adding "AND fR750Cor/fR750Ref>0.9
" to the WHERE-clause gives us
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 328 | 24.58955887 | 6.36 | 67.86 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec)
But we also want to restrict ourselves to "good" zenith angles (zenith angles at which there is no significnat efficiency loss). So we add "AND fZenithDistanceMax<35
" to the WHERE-clause which yields
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 244 | 19.06608557 | 6.36 | 34.90 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec)
Now we need to get a list of these runs with
SELECT FileId FROM RunInfo WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fR750Ref/fR750Cor>0.9 AND fZenithDistanceMax<35
This can later be JOINed with the following queries.
Data retrieval
The events themselves are stored in a table named Events