<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Culverson Software-Custom DAQ Software labVIEW &#187; Data Handling</title>
	<atom:link href="http://culverson.com/category/data-handling/feed/" rel="self" type="application/rss+xml" />
	<link>http://culverson.com</link>
	<description>Custom Labview Data Acquisition Software Maine</description>
	<lastBuildDate>Sat, 05 Mar 2011 21:12:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Needle in the Haystack</title>
		<link>http://culverson.com/needle-haystack/</link>
		<comments>http://culverson.com/needle-haystack/#comments</comments>
		<pubDate>Sat, 05 Mar 2011 21:12:21 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Beginners]]></category>
		<category><![CDATA[Data Handling]]></category>
		<category><![CDATA[LabVIEW]]></category>

		<guid isPermaLink="false">http://culverson.com/?p=305</guid>
		<description><![CDATA[Finding the best answer is not always straightforward. Scientists are not programmers. Repeat that after me: scientists are not programmers. It&#8217;s not their fault; it&#8217;s just a lack of proper training.  If you are implementing some algorithm given you by a scientist, it&#8217;s important to know this and account for it. Certain algorithms are not [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><em><strong>Finding the best answer is not always straightforward.</strong></em></p>
<p>Scientists are not programmers. Repeat that after me: <em>scientists are not programmers</em>. It&#8217;s not their fault; it&#8217;s just a lack of proper training.  If you are implementing some algorithm given you by a scientist, it&#8217;s important to know this and account for it.</p>
<p>Certain algorithms are not direct &#8211; most often for some process which is not easily reversible.  For example, I was given the task of implementing a way of finding the Wet-Bulb temperature, given the Dewpoint temperature, the Dry-Bulb temperature, and the Barometric Pressure.  Accompanying this task was some code, written by a scientist, in some form of BASIC.</p>
<p>To accomplish this, they started with an estimate (the DewPoint Temp) and worked forward, using the known equations to convert wet-bulb temp into dewpoint temp, then compared that result to the known dewpoint (Tdew).  If the result was less than the known dewpoint, they added a constant 0.05 degrees to the estimate, and tried again. When the result exceeded the dewpoint, they called it good and returned the latest estimate as the final answer.</p>
<p><em>Scientists are not programmers.</em> If you asked them about this, they will say that it gets the right answer.  If you ask them how they came to choose 0.05 as the step size, after the blank stare (while they think about it), you will get an answer something like &#8220;Well, that&#8217;s the tolerance I want&#8221;. If you really press them, they will come up with &#8220;Well, any smaller and it&#8217;ll take too long &#8211; any larger and it&#8217;ll not be correct enough&#8221;, which is exactly true. That step size is somebody&#8217;s wild guess.</p>
<p>Being the obsessive speed freak that I am, I figured a better way.  What the scientist didn&#8217;t realize, is that you don&#8217;t have to have a constant step size.  With a modicum of further effort, you can adjust the step size dynamically, and get to the final answer much more quickly.</p>
<p>Simply start with a relatively large positive step, and do your estimates as before.  Afterward, make a decision &#8211; if you haven&#8217;t exceeded your target, step again in the same direction.  If you exceeded the target, don&#8217;t simply quit and call it good, REDUCE and REVERSE your step size and go again.  Now you&#8217;re heading negative. When you go BELOW your target, REDUCE and REVERSE your step size. Repeat this until the absolute value of your step size is below your tolerance.</p>
<p>In certain cases, this will take LONGER, but in the vast majority of cases where a fine tolerance is needed, this will get a more accurate answer in FEWER iterations.</p>
<p>You of course need to check things out and match your particular case. Use an iteration counter. You always want to reduce your step size when you reverse it: a factor of -1 would never converge and a factor of near -1 would converge slowly. But a factor of -0.01 would reduce your tolerance.  Best to use a factor of -0.1 to -0.4.</p>
<p>I have seen reductions of 100:1 in iteration counts between the original method and this improved search.  In cases where it was worse, it was around 15 vs. 10 iterations, in cases where it was better it was around  30 vs. 500 iterations.</p>
<p>Use your common sense, and don&#8217;t hold it against them.  <em>Scientists are not programmers.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://culverson.com/needle-haystack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Beware Simplicity</title>
		<link>http://culverson.com/beware-simplicity/</link>
		<comments>http://culverson.com/beware-simplicity/#comments</comments>
		<pubDate>Fri, 25 Feb 2011 13:49:44 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Beginners]]></category>
		<category><![CDATA[Data Handling]]></category>
		<category><![CDATA[LabVIEW]]></category>
		<category><![CDATA[Timing]]></category>

		<guid isPermaLink="false">http://culverson.com/?p=295</guid>
		<description><![CDATA[Simpler ≠ faster : you still have to know what happens &#8220;under the hood&#8221;. If you read the post about en masse operations, you might remember that I pointed out that you should know what is happening behind the scenes. Here is a particular case where what looks like simpler code actually takes longer to [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><strong><em>Simpler ≠ faster : you still have to know what happens &#8220;under the hood&#8221;.</em></strong></p>
<p>If you read the post about <a href="http://culverson.com/operations-en-masse/" target="_blank">en masse operations</a>, you might remember that I pointed out that you should know what is happening behind the scenes. Here is a particular case where what looks like simpler code actually takes longer to execute.  If you don&#8217;t take the time to think about what is actually going on, then you might be fooled.</p>
<p>Consider a pair of signals, each around 12000 samples. Regulations state that I am allowed to drop (delete) certain samples from those signals before performing statistical operations on them.  The number of points to be dropped might be 2-10%, or up  to 1200 of the points. I have the indexes to be dropped in a third array. For graphing purposes, I need to keep the dropped points in separate arrays.</p>
<p>Now every programmer worth his salt has fallen into the trap of deleting elements 3, 5, and 8 from an array: If you try the straightforward way, you find out that after you delete element 3, that element 5 is not in the same place it was before!  So  you either have to delete element 8 BEFORE you delete element 5 and then 3, or you have to delete element (3-0), then element (5-1), and then element (8-2).</p>
<p>Having fallen into that pothole my share of times many years ago, I avoided it this time by doing the reversing trick: My list of points to drop was known to be in ascending order, so I reversed it, and then did the deletions.  Because I needed the deleted points in proper order, I had to reverse those after the deletion.  Here&#8217;s the code:</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2011/02/DropPoints1.png"><img class="alignnone size-full wp-image-299" title="DropPoints1" src="http://culverson.com/site09/wp-content/uploads/2011/02/DropPoints1.png" alt="Deletions with reversal" width="606" height="233" /></a></p>
<p>That worked fine for some time, but while revisiting this code, it occurred to me that it might be faster to manipulate the index while deleting, and avoid the reversals and speed things up.  Here&#8217;s the code:</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2011/02/DropPoints2.png"><img class="alignnone size-full wp-image-300" title="DropPoints2" src="http://culverson.com/site09/wp-content/uploads/2011/02/DropPoints2.png" alt="" width="648" height="272" /></a></p>
<p>That&#8217;s certainly simpler, right?  As one should always do, I applied a <a href="http://culverson.com/what-time-is-it/" target="_blank">Timing Measurement</a> to it. And I was surprised.  I created two arrays of 12000 numbers and an array of 1200 random (0..11999) indexes.  It was consistently 5-6% MORE TIME this simpler way.</p>
<p>But if you stop and think about what&#8217;s going on, the reason is clear.  Suppose your signal array contains [0, 1, 2, 3, 4, 5] and you want to delete elements [1, 3, 4 ]</p>
<p>Using method A you reverse the list to get [4, 3, 1 ].<br />
You delete element 4.  {that moves element 5 down &#8211; 1 move}<br />
You delete element 3.  {that moves element 5 down &#8211; 1 more move}<br />
You delete element 1. {that moves elements 2, 5 down &#8211; 2 more moves}<br />
That&#8217;s 4 moves that were made in the shuffling process.</p>
<p>Now consider the &#8220;simpler&#8221; method:<br />
You delete element [1-0].  {that moves elements 2,3,4,5 down = 4 moves }<br />
You delete element [3-1].  {that moves elements 4,5 down = 2 moves}<br />
You delete element [4-2].  {that moves element 5 down = 1 move }<br />
That&#8217;s a total of SEVEN moves that were made.</p>
<p>So even though we eliminated three REVERSAL operations, we actually take LONGER because we are doing more work.  The increased amount of data-shuffling was enough to overcome the benefit of removing the reversals.</p>
<p>This was done using a random list of indexes to drop; I&#8217;d bet that there are possible scenarios where this wouldn&#8217;t hold true (for example if the points to drop were few, and confined to the end of the signal), but given that neither of those will be true in my case, I&#8217;m sticking with the original plan &#8211; on average it will be faster.</p>
<p>But don&#8217;t assume that fewer operations on the diagram means less work !</p>
]]></content:encoded>
			<wfw:commentRss>http://culverson.com/beware-simplicity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Operations en Masse</title>
		<link>http://culverson.com/operations-en-masse/</link>
		<comments>http://culverson.com/operations-en-masse/#comments</comments>
		<pubDate>Wed, 19 Aug 2009 14:06:47 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Beginners]]></category>
		<category><![CDATA[Data Handling]]></category>
		<category><![CDATA[Easier Programming]]></category>
		<category><![CDATA[LabVIEW]]></category>

		<guid isPermaLink="false">http://jimdugan.com/culverson/?p=56</guid>
		<description><![CDATA[The things that I used to do… En masse is a French term meaning “as a whole” or “all together”; treating a group of something as a single unit.   LabVIEW has the ability to treat arrays this way, which can greatly reduce your workload. If you come to LabVIEW from a text-based language, it’s [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><strong><em>The things that I used to do…</em></strong></p>
<p style="text-align: left;"><em>En masse</em> is a French term meaning “as a whole” or “all together”; treating a group of something as a single unit.   LabVIEW has the ability to treat arrays this way, which can greatly reduce your workload. If you come to LabVIEW from a text-based language, it’s easy to miss the capabilities that are right at your fingertips.</p>
<p style="text-align: left;">For example, if you need to scale a series of readings into percent of the total (a procedure called normalizing),  then you tend to think:</p>
<p>I need to find the total:</p>
<ul>
<li>I need to start with a zero sum     <em>sum = 0.0;</em></li>
<li>I need to loop over every element  <em>for (int i = 0; i &lt; nElements; i++)</em></li>
<li>I need to add this element to the sum   <em>sum += array[i]</em></li>
</ul>
<p style="text-align: left;">Now I need to divide each entry by the total, to get the fraction of the total:</p>
<ul>
<li><em>for (int i = 0; i &lt; nElements; i++)</em></li>
<li><em>array[i] /= sum;</em></li>
</ul>
<p>Now I need to multiply by 100 to get percentages:</p>
<ul>
<li><em>for (int i = 0; i &lt; nElements; i++)</em></li>
<li><em>array[i] *= 100.0;</em></li>
</ul>
<p>That’s all well and good, and you could translate that literally into LabVIEW and it will get you the answer you want to see.  But that’s not the LabVIEW way of thinking.</p>
<p>What newcomers often fail to realize is that most primitive numeric functions (the ones with yellowish icons) will accept an array of numbers directly. This goes for basic arithmetic (add,subtract, multiply, divide), comparisons (greater than, less than, MAX/MIN), and many other operations.  It will happily multiply an array of numbers by a single scaler number, to produce an array of numbers.</p>
<p>This has great power to reduce the work that you do as the programmer. Consider the literal translation of the above code:</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-11.png"><img class="aligncenter size-full wp-image-166" title="EnMasse-11" src="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-11.png" alt="EnMasse-11" width="498" height="141" /></a></p>
<p>If that’s as good as it gets then why should I go with LabVIEW?</p>
<p>Well, it does get better.  There is a function in the numeric palette called ADD ARRAY ELEMENTS.  If we replace the entire first loop with this function, then we get to this:</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-21.png"><img class="aligncenter size-full wp-image-167" title="EnMasse-21" src="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-21.png" alt="EnMasse-21" width="476" height="145" /></a></p>
<p>Now for the <em>en masse</em> parts: You can replace the entire second loop with a single operation as well:</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-31.png"><img class="aligncenter size-full wp-image-168" title="EnMasse-31" src="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-31.png" alt="EnMasse-31" width="475" height="144" /></a></p>
<p>Any guesses what we can do with the third loop?    Yes, that’s right:</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-41.png"><img class="aligncenter size-full wp-image-169" title="EnMasse-41" src="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-41.png" alt="EnMasse-41" width="474" height="122" /></a></p>
<p>Now you have SO much more room to add comments about what you’re doing!</p>
<p>Now THIS is what makes you more productive in LabVIEW than in C; your chances for error are far less when you let the <em>en masse </em>operators handle the details, and you don’t even have to think about the details.</p>
<p>But be aware of what’s going on, however; there is no magic here.  Under the hood there is still a loop somewhere.  It’s now hidden somewhat; it’s not as obvious, but the work is still being done.  Don’t let the simplicity obscure the real processing that’s going on.</p>
<p>Here is an example of the normalizing function in use, from the real LabVIEW example examples\general\graphs\charts.llb\Draw Stacked Graph.vi (in LabVIEW 8.6. anyway).</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-5.PNG"><img class="aligncenter size-full wp-image-170" title="EnMasse-5" src="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-5.PNG" alt="EnMasse-5" width="194" height="203" /></a></p>
<p>This amounts to the same as our last part above.  In the example, the array given contains five elements and this is executed only once, so efficiency is not a concern.</p>
<p>But consider if the array was 10,000 elements. Don’t forget that the first operation is doing 10,000 divide operations, and the second is doing 10,000 multiplications.  Can you improve things?</p>
<p>Well, certainly! What you have to realize is that, by the associative property of numbers, (X / sum) * 100 is equal to (100 / sum) * X.  You also have to realize that 100 / sum, in this context, is a constant, and therefore needs to be calculated only once.  In effect, you are dividing by sum and multiplying by 100, but you are doing it 10,000 times!</p>
<p>With any luck at all, you get the same answer every time, so you only need to do it once:</p>
<p><a href="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-6.png"><img class="aligncenter size-full wp-image-172" title="EnMasse-6" src="http://culverson.com/site09/wp-content/uploads/2009/08/EnMasse-6.png" alt="EnMasse-6" width="443" height="119" /></a></p>
<p>THIS is why we use LabVIEW!</p>
<p><strong>NOTE</strong>:  <em>En masse </em>is my term for this feature, it is not an official LabVIEW term.</p>
]]></content:encoded>
			<wfw:commentRss>http://culverson.com/operations-en-masse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Delays, delays, delays</title>
		<link>http://culverson.com/delays-delays-delays/</link>
		<comments>http://culverson.com/delays-delays-delays/#comments</comments>
		<pubDate>Sun, 05 Jul 2009 15:33:44 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Data Handling]]></category>
		<category><![CDATA[LabVIEW]]></category>
		<category><![CDATA[Timing]]></category>

		<guid isPermaLink="false">http://jimdugan.com/culverson/?p=66</guid>
		<description><![CDATA[Can’t you signals just work together? Usually, in a data acquisition program,  all the signals you measure are “live”, meaning they represent the current conditions at the time they are sampled. However, in some cases you might have some signals which are not live, but delayed. For example, suppose you’re measuring engine operation, and you [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><strong><em>Can’t you signals just work together?</em></strong></p>
<p style="text-align: left;">Usually, in a data acquisition program,  all the signals you measure are “live”, meaning they represent the current conditions at the time they are sampled. However, in some cases you might have some signals which are not live, but delayed. For example, suppose you’re measuring engine operation, and you have gas analyzers sampling the exhaust airstream. These analyzers conduct the gas from the measurement point in the airflow system around the engine to the analyzer mechanism itself. This gas flows through the sampling tubing at a specific rate, and therefore arrives at the sample point at a specific time AFTER it left the main airstream. These delays might be multiple seconds in duration.  There could be other reasons for this delay, such as an echo-measuring device, or the mechanical response time of some piece of hardware.</p>
<p style="text-align: left;">For analysis purposes, you want to look at the engine holistically and see causes and effects. If the speed changed at a particular point in time, you want to see the CO concentration change as a consequence. But unless you do something to compensate, the change in gas concentration will lag far behind the change in speed that caused it in graphs and tables, making it difficult to judge consequences.</p>
<p style="text-align: left;">To arrange the signals back into time-synchronous alignment, the solution is to think of ALL signals has having TWO delay lines: one physical, and one logical.  In our example case, the physical part is the gas piping conducting the gas. The logical part is completely within the  DAS software. If EVERY channel has a physical delay time (even if it’s zero), and if EVERY channel also has a logical delay time (even if it’s zero), and if EVERY channel has the sum of the physical delay time and the logical delay time equal to a constant, then the signals coming out of the logical delay lines will be time-aligned.</p>
<h4>Example</h4>
<p style="text-align: left;">For example, consider three channels: Channel A records data “live”, i.e. no delay. Channel B has a delay of 3 seconds. Channel C has a delay of 5 seconds.</p>
<p style="text-align: left;">We want to make the entire chain (physical + logical) the same length for all channels, so we pick the longest delay (5 sec) and make that our total delay time. For channel A, which has no physical delay, we have to have a logical delay of 5 sec. For channel B, which has a physical delay of 3 sec, we add a logical delay of 2 sec for a total of 5. For channel C, which has a physical delay of 5 seconds, we have a logical delay of zero. 5+0 = 3+2 = 0+5, so every channel has the same delay, when we count both physical and logical.</p>
<p style="text-align: left;">The logical delay is implemented in queues with every channel having its own queue. Every sample received goes into a queue belonging to that channel; and we pull out one sample from the end of the queue to be recorded. Since one goes in and one comes out, the length of the queue never changes. The initial length of the queue corresponds to the delay time we need for a particular channel. If a queue is empty (of zero length), then the sample we put in and the sample we take out are the same sample. If the queue is of length 10, then the sample we get out was taken 10 sample times ago.</p>
<p>At configuration time, the queues are set up, and populated with zero values to set their length according to how much delay is required. After that, the sample process means putting a sample into a queue and taking one out for use.  Every channel operates the same when actually sampling, it’s only the configuration where they differ.</p>
<p>Note that if you are recording a TIMESTAMP value, then the TIMESTAMP signal requires its own queue as well, so that it comes out in alignment with the signals. Of course this means that the data does not become usable until all the zeroes have been flushed out of the queues. That will happen when the longest delay time has expired. We can judge this by putting a RECORD signal into its own queue. We don’t actually record data coming out of the queues until the RECORD signal coming out is true.</p>
<p>You can keep the data flowing into (and out of) these queues all the time.  When you want to start recording, set the RECORD flag TRUE. When you want to stop recording, set the RECORD flag FALSE.  Pay attention to the value out of the RECORD flag’s queue, and act appropriately. (Don’t act on the RECORD flag itself, act on the delayed RECORD flag).</p>
<p>This also means that when the test is over, there is still data remaining in the queues. We have to keep recording until the valuable data has worked it’s way through the queues, meaning 5 seconds after the last sample (in the example case).</p>
]]></content:encoded>
			<wfw:commentRss>http://culverson.com/delays-delays-delays/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hybrid Data Files</title>
		<link>http://culverson.com/hybrid-data-files/</link>
		<comments>http://culverson.com/hybrid-data-files/#comments</comments>
		<pubDate>Sat, 11 Apr 2009 15:55:27 +0000</pubDate>
		<dc:creator>Jim</dc:creator>
				<category><![CDATA[Data Handling]]></category>
		<category><![CDATA[Files]]></category>
		<category><![CDATA[LabVIEW]]></category>

		<guid isPermaLink="false">http://jimdugan.com/culverson/?p=81</guid>
		<description><![CDATA[Combine BINARY and DATALOG files for the best of both worlds. In LabVIEW, there are three kinds of files: TEXT files. Ordinary text, stored in human-readable form, with spaces and line feeds, etc. BINARY files. Raw information stored as machine-readable information. A 32-bit integer is stored in 4 bytes. A double-precision number is stored in [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><em><strong>Combine BINARY and DATALOG files for the best of both worlds.</strong></em></p>
<p>In LabVIEW, there are three kinds of files:</p>
<ul>
<li>TEXT files. Ordinary text, stored in human-readable form, with spaces and line feeds, etc.</li>
<li>BINARY files. Raw information stored as machine-readable information. A 32-bit integer is stored in 4 bytes. A double-precision number is stored in 8 bytes.</li>
<li>DATALOG files. Structured data, suitable for quick transfer between a memory structure and the disk file.</li>
</ul>
<p>I have long used Datalog files for configuration data.  They offer several advantages:</p>
<ul>
<li>EXTREMELY simple reading/writing. No matter how complicated the data structure, you just open the file, write the structure to it, and close the file. No muss, no fuss. I have a project where the datalog file is a single record of maybe 200 k Bytes. It’s still all done with a single WRITE FILE call.</li>
<li>Being binary, users are reluctant to open it in their text editors and twiddle about. You won’t get a service call to figure out that the user set the serial port to COM1.76 and the alarm level to 0 ( <span><em>But it was working perfectly yesterday!</em></span> ).</li>
<li>Compactness. A DBL is 8 bytes. The value -12.3456 is 8 bytes, not even counting the delimiters separating it from it’s neighbors. If space is truly an issue, use SGL (4 bytes) instead of DBL.</li>
<li>If your panel is set to allow a number to vary between 10 and 20, then that’s what gets stored. You don’t have to check every number in the file to see if it’s in range. Since you wrote it there, it’s correct.</li>
<li>Format checking. Even if you use the standard file dialogs for choosing files (rather than a custom one), you can set it so that it will show only files of the correct type. The user has fewer wrong files to choose from, therefore the odds of a mistake are lessened.</li>
</ul>
<p>There are a couple of disadvantages, though:</p>
<ul>
<li>Rigid formatting. The thing that makes it so easy to read and write, turns around and complicates things when it’s time to revise the format. If you add or remove so much as a single item, or re-arrange the order of things, then the old files are not readable anymore. You can attempt to compensate for this by adding spare fields at the start, but if you make a change to the format, you will have to make an updater which reads the old files and writes new ones in their place, or else all the old files are worthless.</li>
<li>Non-portability. The tamper-resistance feature can be a disadvantage if the file must be available in other (non-LabVIEW) applications. For this reason, datalog format is best suited to files that have limited, well-defined uses.</li>
</ul>
<p>Typically, a data-acquisition program, when used over a reasonable period of time, needs a configuration file to define which channels to use, what their scale factors are, their names, and units, etc., etc. The data recorded in “Run 107″ was recorded using “JOEs setup”, but the data recorded in “Run 108″ was recorded with “JOEs Other Setup”. So how do you keep the files paired? You don’t. You can try various naming schemes, but sooner or later, some mistake will leave the user with a missing or mismatched CONFIG and DATA file pair.</p>
<p>I avoid that whole scenario by including the config data structure inside the data file. Every data file contains the config used to record it. You just put the config cluster inside a large cluster that includes your data, and record that. There is no question about which scale factor was used on the flatistrat channel, because it’s recorded right there. It’s a bit wasteful in terms of disk space, but not terribly so.</p>
<h3>Datalog + Binary = Hybrid</h3>
<p>So, given all that, suppose you have a LOT of data to record. The config data is a small portion, and the data is huge. There’s a problem with the idea of a data file being a cluster containing the config structure and the data. And that problem is memory size. To write a cluster to a datalog file, you have to have the data all in one place. If your data is stored in some other place as it’s acquired, then writing the file means making a COPY of your huge data and putting into the file cluster before writing the file. That’s wasteful. And back in the days when 8 Megabytes was all the RAM a machine could hold, and my clients needed to record more, it wasn’t even POSSIBLE.</p>
<p>To solve those issues, I invented a hybrid file. That term is my own label; it is not an official LabVIEW term. The idea is that you write the config data as a DATALOG file, and close it. Then you open the SAME file as a binary file , skip past the datalog portion, and write binary data. You get the benefit of both worlds: it’s easy to read / write the config header with ordinary datalog operations, and it’s easy to read the binary data with binary operations. You can write the data as you need to; you don’t need to make copies. You can write more data than you have RAM for. You just have to remember that the file doesn’t start at offset zero. It’s perfect, right?</p>
<p>Almost.</p>
<p>You have to figure out where the start of binary data should be, and that’s not trivial. LabVIEW’s DATALOG files include their own header, and the structure and size of that header is not public information. However, you can make some deductions. Since the FILE DIALOG can discriminate between datalog files of different structure, the structure format has to be embedded in the file itself. So what I do is flatten the datalog portion to a string and get it’s size. I take the TYPE STRING (which is not really a string) that comes out of the FLATTEN function, and flatten that and get another size. I add those two sizes together, and round it UP to the next highest multiple of 4096 or so. That works for any size structure, as it includes an estimate for the Datalog header, as well as our own header.</p>
<p>When you read the file, you do the same thing, and compute the offset where the binary data starts.</p>
<p>One more gotcha. Occasionally, National Instruments changes the format used in Datalog files. Usually it’s a minor change, and usually LabVIEW handles it automatically. You may have seen the message “This file was recorded using an older version of LabVIEW, it must be updated to be read. Do you want to update it?”. All that is well and good if it’s a plain Datalog file, but if it’s a hybrid file, LabVIEW doesn’t know anything about the binary data you’ve stuck on the end. So it will open the datalog portion, and re-write a new datalog portion, truncating everything after that, including your data. Beware.</p>
<p>Still this method brings more benefits to the table that it brings problems, so  consider it for your own projects.</p>
<p>Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://culverson.com/hybrid-data-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

