Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

notes.txt@ 14216

Visit:

Last change on this file since 14216 was 14206, checked in by neise, 12 years ago
initial
File size: 3.4 KB

Line
1	uncompressed FITS file:
2
3	human readable ASCII HEADER.
4	80 character lines - no endline character
5
6	line which looks like:
7	END <space chars fill up to 80>
8
9	marks the end of the header, but then
10	still I see more empty lines...
11
12	funny enough, the first non space characters
13	start at adress: 0x00004ec0 = 20160 = 19kB
14	(this was for file 20120117_016.fits.gz)
15	so it seems, after the 'END ' line the header is filled with
16	spaces so, the data starts at the next full kB address.
17
18	an event with ROI 1024 has the size:
19	2952510 bytes
20	this is made from the following fieds:
21	event_num 4 byte
22	Trigger_num 4 byte
23	trigger_type 2 byte
24	numboards 4 byte
25	erorrs 4 x 1 byte
26	softtrig 4 byte
27	unixtime 2x 4byte
28	boardtime 40x 4byte
29	startcells 1440x 2byte
30	startcellTIM 160x 2byte
31	data 1474560 x 2byte = 1440 x 1024 x 2byte
32	timeMarker 0x 2byte ????
33
34	the sum is 2952510 bytes, so the Timemarker field is really zero bytes.
35	the header size is 3390 bytes, while the data size is 2880kB
36	So the header is about 1.15 permille of the data.
37
38	in case of roi = 300 this is about 4 permille.
39
40	For the example of an pedestal file. All uncompressed
41	header information is about 3329.5 kb for
42	a file size of 2.75Gb.
43
44	all the header information should be copied uncompressed
45	into the output file.
46	BUT since the compression algo does deliver
47	unequal sized events the size of an event needs to be stored
48	as well, so that we can still jump around in the file.
49
50	We have two choices:
51	1
52	we store a table of addresses in the header of an event.
53	A file might store something like 1000000 events
54	and a jump adress is 4 bytes long, so this table
55	might be as large as 4 Mb or so,
56	2
57	we prepend each event, with the address of its successor.
58	This means in order to find event 10045 one has to read 10044 events first :-(
59
60
61	Now the compression algorithm should work like this:
62	open a calibration file and retrieve only the mean offset
63	data. convert back to ADC counts.
64	size of offset = 1440 x 1024 x 2 byte = 2880kB
65
66	subtract offset from raw data.
67
68	calculate (signed) diffs of subsequent slices.
69	analyse diffs:
70	find groups of \|diffs\| < 127
71	find groups of \|diffs\| < 31
72	define which groups are ideal :-)
73
74	store compression group header:
75	start value: 2byte
76	size of followers = 4bits, 6 bits, 8 bits, or 16bits. --> 1byte
77	number of followers = 0...1023 --> 2 bytes
78
79	The last follower should maybe be the start value
80	of the next group, so we can check for sanity.
81	but on the other hand, this is just stupid.
82
83	shift diff bits together.
84
85	copy diff bits of followers behind the group header.
86
87	The grouping algo should be smart, so that there are not
88	a lot of small groups. There will be always some
89	spikes of jumps in the data, so we can't assume to survive
90	with only one group.
91
92	I guess it will look like this
93	5-byte header: 16bit group 5 followers
94	10byte following.
95	5-byte header: 6bit group 600 followers
96	450 byte following.
97	5-byte header: 16bit group 3 followers -- spike
98	6byte following
99	5-byte header: 8bit group 80 followers
100	80 byte following.
101	5-byte header: 16bit group 7 followers -- jump
102	14byte following
103	5-byte header: 6bit group 323 followers
104	243 byte following.
105
106	total event size: 833 bytes
107	this is 41% of the uncompressed file size.
108
109	thus one could end at 1.2GB compared to 1.7GB,
110
111	but this has to be checked of course, and therefor
112	the grouping algo needs to be implemented.
113
114	shit
115
116
117
118

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: fact/tools/pyscripts/sandbox/dneise/fact_compress/notes.txt@ 14216

Download in other formats: