[[TOC]] == Purpose == The idea is to write all image parameters for data and Monte Carlo files into a database, so that the access to the data becomes more easy. To some extend (also not necessarily very efficient) the database server can already do the analysis. Everybody is free to use the programming language or tool of their choice to do the analysis. Writing the data to be analysed into a root file has the advantage that no additional software installation other than root itself is required. However, be aware that the files can easily become large. The original proposal can be found here: https://www.fact-project.org/logbook/showthread.php?tid=5604&pid=27027#pid27027 Why not using ROOT's MySQL interface directly. Well, older versions of ROOT had no option to retrieves the data row by row, so that huge result sets filled up your memory quite fast. With '''TSQLStatment''', now an interface is available which allows to access the data row-by-row. Generally speaking, for a pre-compiled program, there is now advantage to rely on ROOT more than necessary. == Status == Most of the data is in the database. Due to some technical reasons still about 1/5th of the data is missing. Most of it from 2016 (seems 2016 the data was more noisy). No automatic process is currently updating the tables and no process is replicating the existing data from La Palma. Data has been filled manually from ''ganymed-summary.root'' files written with the ISDC analysis. The corresponding tool is explained here [[DatabaseBasedAnalysis/table2sql]]. The source positions where calculated with `calcsource` as explained in [[DatabaseBasedAnalysis/calcsource]]. == Basics == Although this page will give some basic examples how to implement an analysis, this is neither an analysis tutorial nor a reference guide to SQL or the required tools. For this, please refer to their manuals. Here are a few examples: * MySQL Manual: https://dev.mysql.com/doc/refman/8.0/en/ (this is an excellent reference!) * MAN page of the mysql client: `man mysql` * `rootifysql --help` and https://www.fact-project.org/logbook/showthread.php?tid=4192 * PHP MySQL API: http://php.net/manual/en/book.mysql.php * MySQL++: https://tangentsoft.com/mysqlpp/home * Python: https://www.tutorialspoint.com/python/python_database_access.htm An example in Python was was kindly provided by Julian Kemp MSc and can be found at [[DatabaseBasedAnalysis/Python]]. == Installing FACT++ == If you want to install `rootifysql`, you find a description here: [[InstallingFACT++]] == Connecting to the Database == Find some basic instructions here how to access the database: DatabaseBasedAnalysis/Connection == Tools to access the database == === mysql === The mysql-client is the classical way to connect to the database and get data. Data is retrieved from a MySQl database always as pure ascii. `mysql` can print the ascii table to the console either as a nicely formatted text or as a machine readable ascii table. === rootifysql === A convenient way to retrieve data is the ''rootifysql'' tool which is part of the FACT++ package. As the name suggests, it writes the data into root files (but can also write the data into ascii files as the mysql-client). More details can be found either calling it with the `--help` option or at https://www.fact-project.org/logbook/showthread.php?tid=4192. To build only the FACT++ tools (fitsdump, root2sql, fits2sql, rootifysql) you can use `cmake -DTOOLS_ONLY=1`. You can create a debian and package and a tar file then with `make package`. You will certainly need the development packages of curses, libnova, libmysql-client, libmysql++ and libopenssl. In addition, you need help2man and groff. Some more detailed instructions can be found at [[DatabaseBasedAnalysis/rootifysql]]. === Other alternatives === Many possibilities exist to access a mysql database as a C API, MySQL++, Python (MySQL.Connector), PHP and others. You are free to use whatever tool you like to access the database, store the data in your preferred file-type locally and process the data. In the following, an analysis will be outlined using the ''rootifysql'' client and because it is most convenient. === !PhpMyAdmin === To get a fast glimpse on the accessible databases and the contents of the tables, you can log-in to !PhpMyAdmin at http://ihp-pc45.ethz.ch/phpmyadmin == Insights into the Database == Some details about the contents and the meaning of the tables in the '''factdata''' database can be found here DatabaseBasedAnalysis/Tables. == Warning == Be aware that a mysql database using the InnoDB engine is optimized to find data fast using an index. That means it can be very fast to select data and read the selected data. On the other hand, it is ''not'' optimized to read individual columns as for example the ''root'' file format. This a simple query, for example, to determine the minimum or maximum of a single column will read the whole table and is thus very resource intensive and slow! '''Think twice before you run a query!''' == The Analysis == Generally speaking, it makes sens to get familiar with the available columns in the table and their meaning. Right now, the best is to guess from the column name (see !PhPMyAdmin). More details will follow soon. === Data Selection === For data selection only run-wise information should be relevant which are stored in the table '''!RunInfo'''. The reason is that if you select data on a more fine grained level (e.g. event-wise zenith angle), right now there is no easy method to determine the corresponding observation time. So whenever data is selected event-wise make sure that you do not cut the data in a variable which cuts out events systematically and not randomly. For example, an event-wise cut in zenith angel usually keeps or discards two consecutive events because their zenith angle is correlated. For a cut in any image parameter (Width, Length, Size, ...), the result on two consecutive events is random because their image parameters are not correlated. As an example we analyse the Crab data from our public sample (01/11/2013 - 06/11/2013). Let's first have a look at the total observation time of all Crab data in this period: {{{#!sql SELECT COUNT(*), SUM(TIME_TO_SEC(TIMEDIFF(fRunStop,fRunStart))*fEffectiveOn/3600) AS EffOnTime, MIN(fZenithDistanceMin) AS MinZd, MAX(fZenithDistanceMax) AS MaxZd, MIN(fR750Cor/fR750Ref) AS MinQ, MAX(fR750Cor/fR750Ref) AS MaxQ FROM RunInfo WHERE fSourceKey=5 /* Select only Crab data */ AND fRunTypeKey=1 /* Select only data runs */ AND FileId BETWEEN 131101000 AND 131107000 }}} The result using {{{mysql}}} is {{{ +----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | MinZd | MaxZd | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 435 | 32.53992354 | 6.36 | 67.89 | 0.01477 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.01 sec) }}} So we have 435 data runs from Crab with a total effective observation time of 32.5 hours in a zenith angle range between 6° and 68° and a bad weather factor between 0.01 (really bad) to 1.1 (extremely good). Taking only good data by adding "`AND fR750Cor/fR750Ref>0.9`" to the WHERE-clause gives us {{{ +----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 328 | 24.58955887 | 6.36 | 67.86 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec) }}} But we also want to restrict ourselves to "good" zenith angles (zenith angles at which there is no significant efficiency loss). So we add "`AND fZenithDistanceMax<35`" to the WHERE-clause which yields {{{ +----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 244 | 19.06608557 | 6.36 | 34.90 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec) }}} Now we need to get a list of these runs with {{{#!sql SELECT FileId FROM RunInfo WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fR750Ref/fR750Cor>0.9 AND fZenithDistanceMax<35 }}} This can later be '''JOIN'''ed with the following queries. Let's write the list of runs into a file. There are plenty of options. Here are a few (assuming the query is in a file query.sql) {{{ mysql -N [...] < query.sql > Crab.txt cat query.sql | mysql -N [...] > Crab.txt rootifysql [...] -n -v0 -d query.sql > Crab.txt rootifysql [...] -n -w Crab.txt query.sql }}} The [...] is placeholder for additional options, in particular the login credentials (ideally they are kept in a file which can not be read by everyone). An alternative is to put {{{ #!/path/to/rootifysql --config=/path/to/resources.rc }}} in the first line of your query.sql. Make it executable "`chmod u+x query.sql`" and put your credentials (uri=) into resources.rc. Now you can call it directly {{{ ./query.sql -n -w Crab.txt }}} === Data statistics === Now we want to get some statistics about the Crab data between 01/11/2013 and 06/11/2013 and see if we can do simple plots. For this we write the table into a root file. {{{ rootifysql [...] --query " \ SELECT \ * \ FROM \ RunInfo \ WHERE \ fSourceKey=5 \ AND \ fRunTypeKey=1 \ AND \ FileId BETWEEN 131101000 AND 131107000 \ AND \ fR750Ref/fR750Cor>0.9 \ AND \ fZenithDistanceMax<35 \ " }}} If the file already exists, either give it a different name (see --help for details) or overwrite it with --force. The output should look similar to this {{{ ------------------------ Rootify SQL ------------------------- Connecting to database... Client Version: 5.7.23 Server Version: 5.7.23-0ubuntu0.18.04.1 Requesting data... Opening file 'rootify.root' [compression=1]... Trying to setup 120 branches... Configured 115 branches. Filling branches... 317 rows fetched. 317 rows skipped due to NULL field. 0 rows filled into tree. 10 kB written to disk. File closed. Execution time: 0.0537751s (169.6 us/row) -------------------------------------------------------------- }}} Per default rows which contain any NULL are not written to the file because all values are converted to a DOUBLE and there is no representation for a NULL-value in double. So, we need to force the output with `--ignore-null` and will get something like: {{{ ------------------------ Rootify SQL ------------------------- Connecting to database... Client Version: 5.7.23 Server Version: 5.7.23-0ubuntu0.18.04.1 Requesting data... Opening file 'rootify.root' [compression=1]... Trying to setup 120 branches... Configured 115 branches. Filling branches... 317 rows fetched. 317 rows filled into tree. 86 kB written to disk. File closed. Execution time: 0.072247s (227.9 us/row) -------------------------------------------------------------- }}} Now we can open the file in root and do plots. The easiest ist to use the tree viewer: {{{ root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->StartViewer(); root [3] }}} Or plot the zenith angle distribution directly: {{{ root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->Draw("fZenithDistanceMean"); root [3] }}} or zenith angle versus time (Hint: DATETIME columns are converted to Unix-time in seconds): {{{ root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->Draw("fZenithDistanceMean:fRunStart"); root [3] }}} === Data retrieval === The image parameters of the events themselves are stored in a table named '''Images'''. The position of the source in camera coordinates is stored in '''Position'''. To get them, you can run the following query {{{#!sql SELECT Images.*, Position.X, Position.Y FROM RunInfo LEFT JOIN Images USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Ref/fR750Cor>0.9 }}} or with the list you wrote before {{{#!sql SELECT Images.*, Position.X, Position.Y LEFT JOIN Images USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) WHERE FileId IN ($MyList) }}} using `--list.MyList=Crab.txt` as command-line option to rootifysql. Both queries are similar in execution time. Let's assume the output file is ''crab-data-only.root'' (`rootifysql --out=crab-data-only.root`). Requesting the data and writing the file took me about 60s. To run an analysis on the data you can use the following root macro "''analysis.C''". It produces a theta-square plot. Its execution took about five seconds (`root analysis.C++`). The `++` behind the macro name forces root to compile the macro before execution (might not be necessary anymore with root 6) to increase performance. {{{#!cpp #include #include #include #include #include void analysis() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("crab-data-only.root"); // Define variables for all leaves to be accessed // By definition rootifysql writes only doubles double X, Y, MeanX, MeanY, Width, Length, CosDelta, SinDelta, M3Long, SlopeLong, Leakage1, SlopeSpreadWeighted, Size, ConcCore, ConcCOG, NumIslands, NumUsedPixels; // Connect the variables to the cordesponding leaves c.SetBranchAddress("X", &X); c.SetBranchAddress("Y", &Y); c.SetBranchAddress("MeanX", &MeanX); c.SetBranchAddress("MeanY", &MeanY); c.SetBranchAddress("Width", &Width); c.SetBranchAddress("Length", &Length); c.SetBranchAddress("CosDelta", &CosDelta); c.SetBranchAddress("SinDelta", &SinDelta); c.SetBranchAddress("M3Long", &M3Long); c.SetBranchAddress("SlopeLong", &SlopeLong); c.SetBranchAddress("Leakage1", &Leakage1); c.SetBranchAddress("NumIslands", &NumIslands); c.SetBranchAddress("NumUsedPixels", &NumUsedPixels); c.SetBranchAddress("SlopeSpreadWeighted", &SlopeSpreadWeighted); c.SetBranchAddress("Size", &Size); c.SetBranchAddress("ConcCOG", &ConcCOG); c.SetBranchAddress("ConcCore", &ConcCore); // Set some constants (they could be included in the database // in the future) double mm2deg = +0.0117193246260285378; double abberation = 1.02; // -------------------- Source dependent parameter calculation ------------------- // Create a histogram for on- and off-data TH1F hon("on", "", 55, 0, 1); TH1F hoff("off", "", 55, 0, 1); // Loop over all events TStopwatch clock; for (int i=0; i5.5 && Leakage1<0.1; bool cut0 = log10(area)0.9 }}} Or you use the existing table for the standard 60° Wobble positions and do just `CROSS JOIN Wobble`. This will give you all you need in ''crab.root'' (`rootifysql --out=crab.root`), but significantly increases computing time and the output file will be about six times larger. A simple macro just applying all the cuts would then be enough to do a theta-square plot {{{#!cpp void analysis() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("crab.root"); // Set some constants (they could be included in the database // in the future) c.SetAlias("mm2deg", "+0.0117193246260285378"); // Define the cuts c.SetAlias("CutQ", "NumIslands<3.5 && NumUsedPixels>5.5 && Leakage1<0.1"); c.SetAlias("Cut0", "log10(Area)0, 1, NULL)) AS `Signal`, COUNT(IF(Weight<0, 1, NULL))/5 AS `Background` FROM ( SELECT Size, NumUsedPixels, NumIslands, Leakage1, Weight, PI()*Width*Length AS Area, @PX := cosa*X - sina*Y, @PY := cosa*Y + sina*X, @DX := MeanX-@PX/1.02, @DY := MeanY-@PY/1.02, @Norm := SQRT(@DX*@DX + @DY*@DY), @Dist := @Norm*0.0117193246260285378 AS Dist, @LX := LEAST(GREATEST((CosDelta*@DY - SinDelta*@DX)/@Norm, -1), 1), @LY := LEAST(GREATEST((CosDelta*@DX + SinDelta*@DY)/@Norm, -1), 1), @Alpha := ASIN(@LX) AS Alpha, @Sign := SIGN(@LY) AS Sign, @M3L := M3Long*@Sign*0.0117193246260285378, @Slope := SlopeLong*@Sign/0.0117193246260285378 AS Slope, @Xi := 1.39252 + 0.154247*@Slope + 1.67972*(1-1/(1+4.86232*Leakage1)), @Sign1 := @M3L+0.07, @Sign2 := (@Dist-0.5)*7.2-@Slope, @Disp := IF (SIGN(@Sign1)<0 || SIGN(@Sign2)<0, -@Xi, @Xi) * (1-Width/Length), @ThetaSq := (@Disp*@Disp + @Dist*@Dist - 2*@Disp*@Dist*SQRT(1-@LX*@LX)) AS ThetaSq FROM RunInfo LEFT JOIN Images USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN Wobble WHERE /* Data Selection */ fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Cor>0.9*fR750Ref AND /* Quality cuts */ NumUsedPixels>5.5 AND NumIslands<3.5 AND Leakage1<0.1 /* Background suppression cut */ HAVING Area < LOG10(Size)*898-1535 ) InnerTable WHERE /* Theta-square cut */ ThetaSq<0.024 ) OuterTable }}} The output in mysql looks something like: {{{ +--------+------------+----------+-------------------+ | Signal | Background | Excess | Significance | +--------+------------+----------+-------------------+ | 984 | 446.0000 | 538.0000 | 19.48657640549446 | +--------+------------+----------+-------------------+ 1 row in set (0 min 11.33 sec) }}} Or like this if you finish the query with "\G" instead of a semicolon: {{{ *************************** 1. row *************************** Signal: 984 Background: 446.0000 Excess: 538.0000 Significance: 19.48657640549446 1 row in set (0 min 11.33 sec) }}} I am sure there is also a query which in addition prints the effective on-time. Note that for performance reasons it is (like in C++) very important that cuts are done as early as possible. Therefore, the quality-cuts are already done during event selection. Whether it is faster to have the area-cut in a `HAVING` clause or in the outer table in a `WHERE` clause might depend on the individual case (and how the optimizer handles it) and need to be tested. Note that a theta-square cut can not work in the HAVING clause because here, theta-square is only evaluated '''after''' the `HAVING` clause, i.e. just before the data is ''sent''. == WARNING: SQL User Variables == The MySQL Reference manual warns explicitly {{{#!div style="background: #d88; border: 3px ridge" As a general rule, other than in SET statements, you should never assign a value to a user variable and read the value within the same statement. }}} This is explain in the manual in more details here: https://dev.mysql.com/doc/refman/8.0/en/user-variables.html So generally speaking, the above example is supposed to FAIL! (...and works only by chance...) I have kept that here for convenience (and with the current server, it works, simply because the optimizer does not shuffle the columns. Now let's have a look at how the query should look correctly (''' Don't panic! '''): {{{#!sql SELECT -- 10 Counter.*, `Signal` - `Background` AS `Excess`, LiMa(`Signal`, `Background`) AS `Significance` FROM ( -- Counter SELECT -- 9 COUNT(IF(Weight>0, 1, NULL)) AS `Signal`, COUNT(IF(Weight<0, 1, NULL))/5 AS `Background` FROM ( -- Table8 SELECT -- 8 Weight, (Disp*Disp + Dist*Dist - 2*Disp*Dist*SQRT(1-LX*LX)) AS ThetaSq FROM ( -- Table7 SELECT -- 7 Weight, Dist, LX, IF (SIGN(Sign1)<0 || SIGN(Sign2)<0, -Xi, Xi) * (1-WdivL) AS Disp FROM ( -- Table6 SELECT -- 6 Weight, WdivL, Dist, LX, 1.39252 + 0.154247*Slope + 1.67972*(1-1/(1+4.86232*Leakage1)) AS Xi, M3L+0.07 AS Sign1, (Dist-0.5)*7.2-Slope AS Sign2 FROM ( -- Table5 SELECT -- 5 Weight, Leakage1, WdivL, LX, Norm *0.0117193246260285378 AS Dist, M3Long *Sign*0.0117193246260285378 AS M3L, SlopeLong*Sign/0.0117193246260285378 AS Slope FROM ( -- Table4 SELECT -- 4 Weight, M3Long, SlopeLong, Leakage1, WdivL, LX, Norm, ASIN(LX) AS Alpha, SIGN(LY) AS Sign FROM ( -- Table 3 SELECT -- 3 Weight, M3Long, SlopeLong, Leakage1, WdivL, Norm, LEAST(GREATEST((CosDelta*DY - SinDelta*DX)/Norm, -1), 1) AS LX, LEAST(GREATEST((CosDelta*DX + SinDelta*DY)/Norm, -1), 1) AS LY FROM ( -- Table2 SELECT -- 2 Weight, CosDelta, SinDelta, DX, DY, M3Long, SlopeLong, Leakage1, WdivL, SQRT(DX*DX + DY*DY) AS Norm FROM ( -- Table1 SELECT -- 1 Weight, CosDelta, SinDelta, M3Long, SlopeLong, Leakage1, WdivL, MeanX - PX/1.02 AS DX, MeanY - PY/1.02 AS DY FROM -- Table0 ( SELECT -- 0 Weight, Size, NumUsedPixels, NumIslands, Leakage1, MeanX, MeanY, CosDelta, SinDelta, M3Long, SlopeLong, Width/Length AS WdivL, PI()*Width*Length AS Area, cosa*X - sina*Y AS PX, cosa*Y + sina*X AS PY FROM RunInfo LEFT JOIN Images USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN Wobble WHERE /* Data selection */ fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Cor>0.9*fR750Ref AND /* Quality cuts */ NumUsedPixels>5.5 AND NumIslands<3.5 AND Leakage1<0.1 /* END: SELECT -- 0 */ ) Table0 WHERE /* Background suppression cut */ Area < LOG10(Size)*898-1535 /* END: SELECT -- 1 */ ) Table1 /* END: SELECT -- 2 */ ) Table2 /* END: SELECT -- 3 */ ) Table3 /* END: SELECT -- 4 */ ) Table4 /* END: SELECT -- 5 */ ) Table5 /* END: SELECT -- 6 */ ) Table6 /* END: SELECT -- 7 */ ) Table7 /* END: SELECT -- 8 */ ) Table8 WHERE /* Theta-square cut */ ThetaSq<0.024 /* END: SELECT -- 9 */ ) Counter /* END: SELECT -- 10 */ }}} This new query results in {{{ Execution time: 17.3689s }}} In this particular example, the `WHERE` clause for the background suppression cut turns out to be much (2x) faster than the `HAVING` clause, while in the previous example, it was the opposite. Note that this is the result of the analysis of a dataset which, as a whole, is more than 100GB and even reduced to Crab after quality cuts still is of the order of GB! As a comparison. The source code which does all the calculations in compiled C++ code takes about 4.5s on the same machine. {{{#!div style="background: #d88; border: 3px ridge" Generally, it is '''not recommended''' to run the full analysis on the server - as long as you want to optimize or change things. Running it on the server only makes sense if you have finished optimizing your analysis and want to process a lot of data (as in an automatic process). }}} On a side note, it seems that explaining a query and profiling a query fails with this type of nested queried because it only gives information on the outermost query which is (in this example) irrelevant. == Optimization and precision maths == A very interesting read is https://dev.mysql.com/doc/refman/5.7/en/precision-math.html (and its sub-sections! Click on ''NEXT''). In short: A value like 1.234 is an ''exact'' value and exact arithmetic calculations are performed utilizing a dedicated library. If a value is given as an approximate ''floating-point'' value (which all '''DOUBLE''' columns are), for example 1.234e0, the calculation is ''not exact''. ''Not exact'' means that it using (faster) ''floating-point'' arithmetic rather than (slower) ''exact'' arithmetic. Now, any mathematical operation which at least contains one ''floating-point'' number is evaluated as a floating-point approximation. This sounds very simple and as if it has nothing to do with our analysis, but it has: All the numbers in the previous SQL queries were given as ''exact'' numbers and thus stored by the engine as ''exact'' numbers. Whenever they take place in one of our calculations, they have to be converted to a floating-point value first (it seems this does not happen in advance but during run-time), which is an operation which is obviously not for free. So as a test, let's append a `e0` to all fractional values in the query, e.g. instead of `0.5`, let's write `0.5e0` (or `5e-1`). This essentially tells the SQL engine to store the value as an approximate ''floating-point'' value and not as an ''exact'' value. Now, let's run the query again... And voila! This is really faster: {{{ Execution time: 9.04737s }}} {{{#!div style="background: #8d8; border: 3px ridge" Conclusion: Wherever you use floating-point values in your query and you are not interested in a 100% machine independent result, write them as ''floating-point'' values by appending `e0` instead of writing them as an ''exact'' values. For integer values (numbers without fractional parts) this is not needed. They mix well with floating-point values and thus no internal conversion is required. }}} == CTE - Common Table Expressions == Common table expression are explained in details at: https://dev.mysql.com/doc/refman/8.0/en/with.html. For our query, the advantage is that the maths in is listed in the order in which we want it to be executed, not in opposite order. (Note that CTE requires MySQL 8). Here is the query using common table expressions (`WITH`). {{{#!sql WITH Table9 AS ( WITH Table8 AS ( WITH Table7 AS ( WITH Table6 AS ( WITH Table5 AS ( WITH Table4 AS ( WITH Table3 AS ( WITH Table2 AS ( WITH Table1 AS ( WITH Table0 AS ( SELECT -- 0 Weight, Size, NumUsedPixels, NumIslands, Leakage1, MeanX, MeanY, CosDelta, SinDelta, M3Long, SlopeLong, Width/Length AS WdivL, PI()*Width*Length AS Area, cosa*X - sina*Y AS PX, cosa*Y + sina*X AS PY FROM RunInfo LEFT JOIN Images USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN Wobble WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Cor>0.9e0*fR750Ref AND NumUsedPixels>5.5 AND NumIslands<3.5 AND Leakage1<0.1 ) -- AS Table0 SELECT -- 1 Weight, CosDelta, SinDelta, M3Long, SlopeLong, Leakage1, WdivL, MeanX - PX/1.02e0 AS DX, MeanY - PY/1.02e0 AS DY FROM Table0 WHERE Area < LOG10(Size)*898e0-1535e0 ) -- AS Table1 SELECT -- 2 Weight, CosDelta, SinDelta, DX, DY, M3Long, SlopeLong, Leakage1, WdivL, SQRT(DX*DX + DY*DY) AS Norm FROM Table1 ) -- AS Table2 SELECT -- 3 Weight, M3Long, SlopeLong, Leakage1, WdivL, Norm, LEAST(GREATEST((CosDelta*DY - SinDelta*DX)/Norm, -1), 1) AS LX, LEAST(GREATEST((CosDelta*DX + SinDelta*DY)/Norm, -1), 1) AS LY FROM Table2 ) -- AS Table3 SELECT -- 4 Weight, M3Long, SlopeLong, Leakage1, WdivL, LX, Norm, ASIN(LX) AS Alpha, SIGN(LY) AS Sign FROM Table3 ) -- AS Table4 SELECT -- 5 Weight, Leakage1, WdivL, LX, Norm *0.0117193246260285378e0 AS Dist, M3Long *Sign*0.0117193246260285378e0 AS M3L, SlopeLong*Sign/0.0117193246260285378e0 AS Slope FROM Table4 ) -- AS Table5 SELECT -- 6 Weight, WdivL, Dist, LX, 1.39252e0 + 0.154247e0*Slope + 1.67972e0*(1-1/(1+4.86232e0*Leakage1)) AS Xi, M3L+0.07e0 AS Sign1, (Dist-0.5e0)*7.2e0-Slope AS Sign2 FROM Table5 ) -- AS Table6 SELECT -- 7 Weight, Dist, LX, IF (SIGN(Sign1)<0 || SIGN(Sign2)<0, -Xi, Xi) * (1-WdivL) AS Disp FROM Table6 ) -- AS Table7 SELECT -- 8 Weight, (Disp*Disp + Dist*Dist - 2*Disp*Dist*SQRT(1-LX*LX)) AS ThetaSq FROM Table7 ) -- AS Table8 SELECT -- 9 COUNT(IF(Weight>0, 1, NULL)) AS `Signal`, COUNT(IF(Weight<0, 1, NULL))/5 AS `Background` FROM Table8 WHERE ThetaSq<0.024 ) -- AS Table9 SELECT -- 10 *, `Signal` - `Background` AS `Excess`, LiMa(`Signal`, `Background`) AS `Significance` FROM Table9 }}} Here with floating-point numbers. The execution time is similar. {{{ Execution time: 9.09697s }}} == Optimizing the query further == By throwing out unnecessary calculations, this query can be optimized further {{{#!sql WITH Table9 AS ( WITH Table8 AS ( WITH Table7 AS ( WITH Table6 AS ( WITH Table5 AS ( WITH Table3 AS ( WITH Table2 AS ( WITH Table1 AS ( WITH Table0 AS ( SELECT -- 0 Weight, Size, NumUsedPixels, NumIslands, Leakage1, MeanX, MeanY, CosDelta, SinDelta, M3Long, SlopeLong, Width/Length AS WdivL, PI()*Width*Length AS Area, cosa*X - sina*Y AS PX, cosa*Y + sina*X AS PY FROM RunInfo LEFT JOIN Images USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN Wobble WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Cor>0.9e0*fR750Ref AND NumUsedPixels>5.5 AND NumIslands<3.5 AND Leakage1<0.1 ) -- AS Table0 SELECT -- 1 Weight, CosDelta, SinDelta, M3Long, SlopeLong, Leakage1, WdivL, MeanX - PX/1.02e0 AS DX, MeanY - PY/1.02e0 AS DY FROM Table0 WHERE Area < LOG10(Size)*898e0-1535e0 ) -- AS Table1 SELECT -- 2 Weight, CosDelta, SinDelta, DX, DY, M3Long, SlopeLong, Leakage1, WdivL, SQRT(DX*DX + DY*DY) AS Norm FROM Table1 ) -- AS Table2 SELECT -- 3 Weight, M3Long, SlopeLong, Leakage1, WdivL, Norm, LEAST(GREATEST((CosDelta*DY - SinDelta*DX)/Norm, -1), 1) AS LX, SIGN(CosDelta*DX + SinDelta*DY) AS Sign FROM Table2 ) -- AS Table3 SELECT -- 5 Weight, Leakage1, WdivL, LX, Norm *0.0117193246260285378e0 AS Dist, M3Long *Sign*0.0117193246260285378e0 AS M3L, SlopeLong*Sign/0.0117193246260285378e0 AS Slope FROM Table3 ) -- AS Table5 SELECT -- 6 Weight, WdivL, Dist, LX, M3L, Slope, 1.39252e0 + 0.154247e0*Slope + 1.67972e0*(1-1/(1+4.86232e0*Leakage1)) AS Xi FROM Table5 ) -- AS Table6 SELECT -- 7 Weight, Dist, LX, IF (M3L<-0.07 || (Dist-0.5e0)*7.2e0-Slope<0, -Xi, Xi) * (1-WdivL) AS Disp FROM Table6 ) -- AS Table7 SELECT -- 8 Weight, (Disp*Disp + Dist*Dist - 2*Disp*Dist*SQRT(1-LX*LX)) AS ThetaSq FROM Table7 ) -- AS Table8 SELECT -- 9 COUNT(IF(Weight>0, 1, NULL)) AS `Signal`, COUNT(IF(Weight<0, 1, NULL))/5 AS `Background` FROM Table8 WHERE ThetaSq<0.024 ) -- AS Table9 SELECT -- 10 *, `Signal` - `Background` AS `Excess`, LiMa(`Signal`, `Background`) AS `Significance` FROM Table9 }}} With {{{ Execution time: 6.75859s }}} Another way to write the same query is {{{#!sql WITH Table0 AS ( SELECT -- 0 Weight, Size, NumUsedPixels, NumIslands, Leakage1, MeanX, MeanY, CosDelta, SinDelta, M3Long, SlopeLong, Width/Length AS WdivL, PI()*Width*Length AS Area, cosa*X - sina*Y AS PX, cosa*Y + sina*X AS PY FROM RunInfo LEFT JOIN Images USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN Wobble WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Cor>0.9e0*fR750Ref AND NumUsedPixels>5.5 AND NumIslands<3.5 AND Leakage1<0.1 ), Table1 AS ( SELECT Weight, CosDelta, SinDelta, M3Long, SlopeLong, Leakage1, WdivL, MeanX - PX/1.02e0 AS DX, MeanY - PY/1.02e0 AS DY FROM Table0 WHERE Area < LOG10(Size)*898e0-1535e0 ), Table2 AS ( SELECT Weight, CosDelta, SinDelta, DX, DY, M3Long, SlopeLong, Leakage1, WdivL, SQRT(DX*DX + DY*DY) AS Norm FROM Table1 ), Table3 AS ( SELECT Weight, M3Long, SlopeLong, Leakage1, WdivL, Norm, LEAST(GREATEST((CosDelta*DY - SinDelta*DX)/Norm, -1), 1) AS LX, SIGN(CosDelta*DX + SinDelta*DY) AS Sign FROM Table2 ), Table5 AS ( SELECT Weight, Leakage1, WdivL, LX, Norm *0.0117193246260285378e0 AS Dist, M3Long *Sign*0.0117193246260285378e0 AS M3L, SlopeLong*Sign/0.0117193246260285378e0 AS Slope FROM Table3 ), Table6 AS ( SELECT Weight, WdivL, Dist, LX, M3L, Slope, 1.39252e0 + 0.154247e0*Slope + 1.67972e0*(1-1/(1+4.86232e0*Leakage1)) AS Xi FROM Table5 ), Table7 AS ( SELECT Weight, Dist, LX, IF (M3L<-0.07 || (Dist-0.5e0)*7.2e0-Slope<0, -Xi, Xi) * (1-WdivL) AS Disp FROM Table6 ), Table8 AS ( SELECT Weight, (Disp*Disp + Dist*Dist - 2*Disp*Dist*SQRT(1-LX*LX)) AS ThetaSq FROM Table7 ), Table9 AS ( SELECT -- 9 COUNT(IF(Weight>0, 1, NULL)) AS `Signal`, COUNT(IF(Weight<0, 1, NULL))/5 AS `Background` FROM Table8 WHERE ThetaSq<0.024 ) SELECT *, `Signal` - `Background` AS `Excess`, LiMa(`Signal`, `Background`) AS `Significance` FROM Table9 }}} Almost as fast as the 4.5s with C++ where the C++ compiler optimizes the maths. The difference is most likely overhead for example from the SQL optimizer. Just for curiosity, what does the optimizer make with this query? The following is the output of the `--explain` directive slightly re-formatted for readability. {{{#!sql WITH Table9 AS ( /* select#2 */ SELECT COUNT(IF((Weight>0),1,NULL)) AS `Signal`, COUNT(IF((Weight<0),1,NULL))/5 AS `Background` FROM RunInfo JOIN Images JOIN Position JOIN Wobble WHERE RunInfo.fRunTypeKey = 1 AND RunInfo.fSourceKEY = 5 AND Position.EvtNumber = Images.EvtNumber AND Images.FileId = RunInfo.FileId AND Position.FileId=RunInfo.FileId AND (((((IF(((((`M3Long` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) * 0.0117193246260285378e0) < (-(0.07))) or (((((SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))))) * 0.0117193246260285378e0) - 0.5e0) * 7.2e0) - ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0)) < 0)),-(((1.39252e0 + (0.154247e0 * ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0))) + (1.67972e0 * (1 - (1 / (1 + (4.86232e0 * `Leakage1`))))))),((1.39252e0 + (0.154247e0 * ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0))) + (1.67972e0 * (1 - (1 / (1 + (4.86232e0 * `Leakage1`))))))) * (1 - (`Width` / `Length`))) * (IF(((((`M3Long` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) * 0.0117193246260285378e0) < (-(0.07))) or (((((SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))))) * 0.0117193246260285378e0) - 0.5e0) * 7.2e0) - ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0)) < 0)),-(((1.39252e0 + (0.154247e0 * ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0))) + (1.67972e0 * (1 - (1 / (1 + (4.86232e0 * `Leakage1`))))))),((1.39252e0 + (0.154247e0 * ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0))) + (1.67972e0 * (1 - (1 / (1 + (4.86232e0 * `Leakage1`))))))) * (1 - (`Width` / `Length`)))) + ((SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))))) * 0.0117193246260285378e0) * (SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))))) * 0.0117193246260285378e0))) - (((2 * (IF(((((`M3Long` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) * 0.0117193246260285378e0) < (-(0.07))) or (((((SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))))) * 0.0117193246260285378e0) - 0.5e0) * 7.2e0) - ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0)) < 0)),-(((1.39252e0 + (0.154247e0 * ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0))) + (1.67972e0 * (1 - (1 / (1 + (4.86232e0 * `Leakage1`))))))),((1.39252e0 + (0.154247e0 * ((`SlopeLong` * SIGN(((`CosDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + (`SinDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))) / 0.0117193246260285378e0))) + (1.67972e0 * (1 - (1 / (1 + (4.86232e0 * `Leakage1`))))))) * (1 - (`Width` / `Length`)))) * (SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))))) * 0.0117193246260285378e0)) * SQRT((1 - (least(greatest((((`CosDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))) - (`SinDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)))) / SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))),(-(1))),1) * least(greatest((((`CosDelta` * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0))) - (`SinDelta` * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)))) / SQRT((((`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0)) * (`MeanX` - (((`cosa` * `X`) - (`sina` * `Y`)) / 1.02e0))) + ((`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)) * (`MeanY` - (((`cosa` * `Y`) + (`sina` * `X`)) / 1.02e0)))))),(-(1))),1)))))) < 0.024) AND PI()*Width*Length < log10(Size)*898e0 - 1535e0 AND RunInfo.FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax < 35 AND fR750Cor>0.9e0*fR750Ref AND NumUsedPixels>5.5 AND NumIslands<3.5 AND Leakage1<0.1 ) /* select#1 */ SELECT '984' AS `Signal`, '446.0000' AS `Background`, '984' - '446.0000' AS `Excess`, LiMa('984','446.0000') AS `Significance` FROM DUAL }}} An interesting side effect is that if I run this query, I gain again about 1s run-time (most likely because the optimizer has nothing to do). (Note that to run this query, you have to replace the numbers in the outer SELECT by the corresponding column names and to remove the '''''' markers)