Version 27 (modified by 6 years ago) ( diff ) | ,
---|
Connecting to the Database
DatabaseBasedAnalysis/Connection
Tools to access the database
rootifysql
A convenient way to retrieve data is the rootifysql tool which is part of the FACT++ package. As the name suggests, it writes the data into root files (but can also write the data into ascii files as the mysql-client). More details can be found either calling it with the --help option or at https://www.fact-project.org/logbook/showthread.php?tid=4192.
Other alternatives
Many possibilities exist to access a mysql database as a C API, MySQL++, Python (MySQL.Connector), PHP and others. You are free to use whatever tool you like. In the following, an analysis will be outlined using the rootifysql client and because it is most convenient.
PhpMyAdmin
To get a fast glimpse on the accessible databases and tables, you can log-in to PhpMyAdmin at http://iph-pc45.ethz.ch/phpmyadmin
The Analyis
Data Selection
For data selection only run-wise information should be relevant which are stored in the table RunInfo. The reason is that if you select data on a more fine grained level (e.g. event-wise zenith angle), right now there is no easy method to determine the corresponding observation time. So whenever data is selected event-wise make sure that you do not cut the data in a variable which cuts out events systematically and not randomly. For example, an event-wise cut in zenith angel usually keeps or discards two consecutive events because their zenith angle is correlated. For a cut in any image parameter (Width, Length, Size, ...), the result on two consecutive events is random because their image parameters are not correlated.
As an example we analyse the Crab data from our public sample (01/11/2013 - 06/11/2013).
Let's first have a look at the total observation time of all Crab data in this period:
SELECT COUNT(*), SUM(TIME_TO_SEC(TIMEDIFF(fRunStop,fRunStart))*fEffectiveOn/3600) AS EffOnTime, MIN(fZenithDistanceMin) AS MinZd, MAX(fZenithDistanceMax) AS MaxZd, MIN(fR750Cor/fR750Ref) AS MinQ, MAX(fR750Cor/fR750Ref) AS MaxQ FROM RunInfo WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000
The result (in mysql) is
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | MinZd | MaxZd | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 435 | 32.53992354 | 6.36 | 67.89 | 0.01477 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.01 sec)
So we have 435 data runs from Crab with a total effective observation time of 32.5 hours in a zenith angle range between 6° and 68° and a bad weather factor between 0.01 (really bad) to 1.1 (extremely good).
Taking only good data by adding "AND fR750Cor/fR750Ref>0.9
" to the WHERE-clause gives us
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 328 | 24.58955887 | 6.36 | 67.86 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec)
But we also want to restrict ourselves to "good" zenith angles (zenith angles at which there is no significnat efficiency loss). So we add "AND fZenithDistanceMax<35
" to the WHERE-clause which yields
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 244 | 19.06608557 | 6.36 | 34.90 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec)
Now we need to get a list of these runs with
SELECT FileId FROM RunInfo WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fR750Ref/fR750Cor>0.9 AND fZenithDistanceMax<35
This can later be JOINed with the following queries.
Let's write the list of runs into a file. There are plenty of options. Here are a few (assuming the query is in a file query.sql)
mysql -N [...] < query.sql > Crab.txt cat query.sql | mysql -N [...] > Crab.txt rootifysql [...] -n -v0 -d query.sql > Crab.txt rootifysql [...] -n -w Crab.txt query.sql
The [...] is placeholder for additional options, in particular the login credentials (ideally they are kept in a file which can not be read by everyone).
An alternative is to put
#!/path/to/rootifysql --config=/path/to/resources.rc
in the first line of your query.sql. Make it executable "chmod u+x query.sql
" and put your credentials (uri=) into resources.rc. Now you can call it directly
./query.sql -n -w Crab.txt
Data statistics
Now we want to get some statistics about the Crab data between 01/11/2013 and 06/11/2013 and see if we can do simple plots. For this we write the table into a root file.
rootifysql [...] --query " \ SELECT \ * \ FROM \ RunInfo \ WHERE \ fSourceKey=5 \ AND \ fRunTypeKey=1 \ AND \ FileId BETWEEN 131101000 AND 131107000 \ AND \ fR750Ref/fR750Cor>0.9 \ AND \ fZenithDistanceMax<35 \ "
If the file already exists, either give it a different name (see --help for details) or overwrite it with --force.
The output should look similar to this
Reading global options from 'fact++.rc'. Reading default options from 'rootifysql.rc'. ------------------------ Rootify SQL ------------------------- Connecting to database... Client Version: 5.7.23 Server Version: 5.7.23-0ubuntu0.18.04.1 Requesting data... Opening file 'rootify.root' [compression=1]... Trying to setup 120 branches... Configured 115 branches. Filling branches... 317 rows fetched. 317 rows skipped due to NULL field. 0 rows filled into tree. 10 kB written to disk. File closed. Execution time: 0.0537751s (169.6 us/row) --------------------------------------------------------------
Per default rows which contain any NULL are not written to the file because all values are converted to a DOUBLE and there is no representation for a NULL-value in double. So, we need to force the output with --ignore-null
and will get something like:
Reading global options from 'fact++.rc'. Reading default options from 'rootifysql.rc'. ------------------------ Rootify SQL ------------------------- Connecting to database... Client Version: 5.7.23 Server Version: 5.7.23-0ubuntu0.18.04.1 Requesting data... Opening file 'rootify.root' [compression=1]... Trying to setup 120 branches... Configured 115 branches. Filling branches... 317 rows fetched. 317 rows filled into tree. 86 kB written to disk. File closed. Execution time: 0.072247s (227.9 us/row) --------------------------------------------------------------
Now we can open the file in root and do plots. The easiest ist to use the tree viewer:
root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->StartViewer(); root [3]
Or plot the zenith angle distribution directly:
root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->Draw("fZenithDistanceMean"); root [3]
or zenith angle versus time (Hint: DATETIME columns are converted to Unix-time in seconds):
root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->Draw("fZenithDistanceMean:fRunStart"); root [3]
Data retrieval
The events themselves are stored in a table named Events. The position of the source in camera coordinates is stored in Position. To get them, you can run the following query
SELECT Events.*, Position.X, Position.Y FROM RunInfo LEFT JOIN Events USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Ref/fR750Cor>0.9
or with the list you wrote before
SELECT Events.*, Position.X, Position.Y LEFT JOIN Events USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) WHERE FileId IN ($MyList)
using --list.MyList=Crab.txt
as command-line option to rootifysql. Both queries are similar in execution time.
Let's assume the output file is crab-data-only.root (rootifysql --out=crab-data-only.root
). Requesting the data and writing the file took me about 60s.
To run an analysis on the data you can use the following root macro "ganymed.C". It produces a theta-square plot. Its execution took about 5s (root ganymed.C++
)
#include <iostream> #include <TMath.h> #include <TH1.h> #include <TChain.h> #include <TStopwatch.h> void ganymed() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("crab-data-only.root"); // Define variables for all leaves to be accessed // By definition rootifysql writes only doubles double X, Y, MeanX, MeanY, Width, Length, CosDelta, SinDelta, M3Long, SlopeLong, Leakage1, SlopeSpreadWeighted, Size, ConcCore, ConcCOG, NumIslands, NumUsedPixels; // Connect the variables to the cordesponding leaves c.SetBranchAddress("X", &X); c.SetBranchAddress("Y", &Y); c.SetBranchAddress("MeanX", &MeanX); c.SetBranchAddress("MeanY", &MeanY); c.SetBranchAddress("Width", &Width); c.SetBranchAddress("Length", &Length); c.SetBranchAddress("CosDelta", &CosDelta); c.SetBranchAddress("SinDelta", &SinDelta); c.SetBranchAddress("M3Long", &M3Long); c.SetBranchAddress("SlopeLong", &SlopeLong); c.SetBranchAddress("Leakage1", &Leakage1); c.SetBranchAddress("NumIslands", &NumIslands); c.SetBranchAddress("NumUsedPixels", &NumUsedPixels); c.SetBranchAddress("SlopeSpreadWeighted", &SlopeSpreadWeighted); c.SetBranchAddress("Size", &Size); c.SetBranchAddress("ConcCOG", &ConcCOG); c.SetBranchAddress("ConcCore", &ConcCore); // Set some constants (they could be included in the database // in the future) double mm2deg = +0.0117193246260285378; double abberation = 1.02; // -------------------- Source dependent parameter calculation ------------------- // Create a histogram for on- and off-data TH1F hon("on", "", 55, 0, 1); TH1F hoff("off", "", 55, 0, 1); // Loop over all events TStopwatch clock; for (int i=0; i<c.GetEntries(); i++) { // read the i-th event from the file c.GetEntry(i); // First calculate all cuts to speedup the analysis double area = TMath::Pi()*Width*Length; bool cutq = NumIslands<3.5 && NumUsedPixels>5.5 && Leakage1<0.1; bool cut0 = log10(area)<log10(Size)*898-1535; if (!cutq || !cut0) continue; // Loop over all wobble positions in the camera for (int angle=0; angle<360; angle+=60) { // -------------------- Source dependent parameter calculation ------------------- double cr = cos(angle*TMath::DegToRad()); double sr = sin(angle*TMath::DegToRad()); double px = cr*X-sr*Y; double py = cr*Y+sr*X; double dx = MeanX - px/abberation; double dy = MeanY - py/abberation; double dist = sqrt(dx*dx + dy*dy); double alpha = asin((CosDelta*dy - SinDelta*dx)/dist); double sgn = TMath::Sign(1., (CosDelta*dx + SinDelta*dy)/dist); // ------------------------------- Application ---------------------------------- double m3l = M3Long*sgn*mm2deg; double slope = SlopeLong*sgn/mm2deg; // --------------------------------- Analysis ----------------------------------- double xi = 1.39252 + 0.154247*slope + 1.67972 *(1-1/(1+4.86232*Leakage1)); double sign1 = m3l+0.07; double sign2 = (dist*mm2deg-0.5)*7.2-slope; double disp = (sign1<0 || sign2<0 ? -xi : xi)*(1-Width/Length)/mm2deg; double thetasq = (disp*disp + dist*dist - 2*disp*dist*cos(alpha))*mm2deg*mm2deg; // Fill the on- and off-histograms if (angle==0) hon.Fill(thetasq); else hoff.Fill(thetasq, 1./5); } } clock.Print(); // Plot the result hon.SetMinimum(0); hon.DrawCopy(); hoff.DrawCopy("same"); }
You can of course include all the calculations into your query already
SELECT Events.*, Angle, weight, @PX := cosa*X - sina*Y, @PY := cosa*Y + sina*X, @DX := MeanX-@PX/1.02, @DY := MeanY-@PY/1.02, @Norm := SQRT(@DX*@DX + @DY*@DY), @Dist := @Norm*0.0117193246260285378 AS Dist, PI()*Width*Length*0.0117193246260285378*0.0117193246260285378 AS Area, @LX := TRUNCATE((CosDelta*@DY - SinDelta*@DX)/@Norm, 6), @LY := TRUNCATE((CosDelta*@DX + SinDelta*@DY)/@Norm, 6), @Alpha := ASIN(@LX) AS Alpha, @Sign := SIGN(@LY) AS Sign, @M3L := M3Long*@Sign*0.0117193246260285378, @Slope := SlopeLong*@Sign/0.0117193246260285378 AS Slope, @Xi := 1.39 + 0.154*@Slope + 1.679*(1-1/(1+4.86*Leakage1)), @Sign1 := @M3L+0.07, @Sign2 := (@Dist-0.5)*7.2-@Slope, @Disp := IF (SIGN(@Sign1)<0 || SIGN(@Sign2)<0, -@Xi, @Xi) * (1-Width/Length), @ThetaSq := (@Disp*@Disp + @Dist*@Dist - 2*@Disp*@Dist*SQRT(1-@LX*@LX)) AS ThetaSq FROM RunInfo LEFT JOIN Events USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN ( SELECT 0 AS Angle UNION ALL SELECT 60 AS Angle UNION ALL SELECT 120 AS Angle UNION ALL SELECT 180 AS Angle UNION ALL SELECT 240 AS Angle UNION ALL SELECT 300 AS Angle ) Wobble WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Ref/fR750Cor>0.9
Or you use the existing table for the standard 60° Wobble positions and do just CROSS JOIN Wobble
.
This will give you all you need in crab.root (rootifysql --out=crab.root
), but significantly increases computing time and the output file will be about six times larger.
A simple macro just applying all the cuts would then be enough to do a theta-square plot
void ganymed3() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("crab.root"); // Set some constants (they could be included in the database // in the future) c.SetAlias("mm2deg", "+0.0117193246260285378"); // Define the cuts c.SetAlias("CutQ", "NumIslands<3.5 && NumUsedPixels>5.5 && Leakage1<0.1"); c.SetAlias("Cut0", "log10(Area)<log10(Size)*898-1535"); // Do one plot for each wobble position c.Draw("ThetaSq", "(ThetaSq<1 && CutQ && Cut0 && Angle==0)*( weight)"); c.Draw("ThetaSq", "(ThetaSq<1 && CutQ && Cut0 && Angle!=0)*(-weight)", "same"); }
Combining everything into a single query is a bit tricky but works:
SELECT Counter.*, `Signal` - `Background`/5 AS `Excess`, LiMa(`Signal`, `Background`/5) AS `Significance` FROM ( SELECT @S := COUNT(IF(Weight>0, 1, NULL)) AS `Signal`, @B := COUNT(IF(Weight<0, 1, NULL)) AS `Background` FROM ( SELECT Size, NumUsedPixels, NumIslands, Leakage1, Weight, PI()*Width*Length AS Area, @PX := cosa*X - sina*Y, @PY := cosa*Y + sina*X, @DX := MeanX-@PX/1.02, @DY := MeanY-@PY/1.02, @Norm := SQRT(@DX*@DX + @DY*@DY), @Dist := @Norm*0.0117193246260285378 AS Dist, @LX := TRUNCATE((CosDelta*@DY - SinDelta*@DX)/@Norm, 6), @LY := TRUNCATE((CosDelta*@DX + SinDelta*@DY)/@Norm, 6), @Alpha := ASIN(@LX) AS Alpha, @Sign := SIGN(@LY) AS Sign, @M3L := M3Long*@Sign*0.0117193246260285378, @Slope := SlopeLong*@Sign/0.0117193246260285378 AS Slope, @Xi := 1.39252 + 0.154247*@Slope + 1.67972*(1-1/(1+4.86232*Leakage1)), @Sign1 := @M3L+0.07, @Sign2 := (@Dist-0.5)*7.2-@Slope, @Disp := IF (SIGN(@Sign1)<0 || SIGN(@Sign2)<0, -@Xi, @Xi) * (1-Width/Length), @ThetaSq := (@Disp*@Disp + @Dist*@Dist - 2*@Disp*@Dist*SQRT(1-@LX*@LX)) AS ThetaSq FROM RunInfo LEFT JOIN Events USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN Wobble WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Cor>0.9*fR750Ref ) TableAlias WHERE ThetaSq<0.156 AND Area < LOG10(Size)*898-1535 AND NumUsedPixels>5.5 AND NumIslands<3.5 AND Leakage1<0.1 ) Counter
I am sure there is also a query which in addition prints the effective on-time