Hybrid Data Files

Combine BINARY and DATALOG files for the best of both worlds.

In LabVIEW, there are three kinds of files:

TEXT files. Ordinary text, stored in human-readable form, with spaces and line feeds, etc.
BINARY files. Raw information stored as machine-readable information. A 32-bit integer is stored in 4 bytes. A double-precision number is stored in 8 bytes.
DATALOG files. Structured data, suitable for quick transfer between a memory structure and the disk file.

I have long used Datalog files for configuration data. They offer several advantages:

EXTREMELY simple reading/writing. No matter how complicated the data structure, you just open the file, write the structure to it, and close the file. No muss, no fuss. I have a project where the datalog file is a single record of maybe 200 k Bytes. It’s still all done with a single WRITE FILE call.
Being binary, users are reluctant to open it in their text editors and twiddle about. You won’t get a service call to figure out that the user set the serial port to COM1.76 and the alarm level to 0 ( But it was working perfectly yesterday! ).
Compactness. A DBL is 8 bytes. The value -12.3456 is 8 bytes, not even counting the delimiters separating it from it’s neighbors. If space is truly an issue, use SGL (4 bytes) instead of DBL.
If your panel is set to allow a number to vary between 10 and 20, then that’s what gets stored. You don’t have to check every number in the file to see if it’s in range. Since you wrote it there, it’s correct.
Format checking. Even if you use the standard file dialogs for choosing files (rather than a custom one), you can set it so that it will show only files of the correct type. The user has fewer wrong files to choose from, therefore the odds of a mistake are lessened.

There are a couple of disadvantages, though:

Rigid formatting. The thing that makes it so easy to read and write, turns around and complicates things when it’s time to revise the format. If you add or remove so much as a single item, or re-arrange the order of things, then the old files are not readable anymore. You can attempt to compensate for this by adding spare fields at the start, but if you make a change to the format, you will have to make an updater which reads the old files and writes new ones in their place, or else all the old files are worthless.
Non-portability. The tamper-resistance feature can be a disadvantage if the file must be available in other (non-LabVIEW) applications. For this reason, datalog format is best suited to files that have limited, well-defined uses.

Typically, a data-acquisition program, when used over a reasonable period of time, needs a configuration file to define which channels to use, what their scale factors are, their names, and units, etc., etc. The data recorded in “Run 107″ was recorded using “JOEs setup”, but the data recorded in “Run 108″ was recorded with “JOEs Other Setup”. So how do you keep the files paired? You don’t. You can try various naming schemes, but sooner or later, some mistake will leave the user with a missing or mismatched CONFIG and DATA file pair.

I avoid that whole scenario by including the config data structure inside the data file. Every data file contains the config used to record it. You just put the config cluster inside a large cluster that includes your data, and record that. There is no question about which scale factor was used on the flatistrat channel, because it’s recorded right there. It’s a bit wasteful in terms of disk space, but not terribly so.

Datalog + Binary = Hybrid

So, given all that, suppose you have a LOT of data to record. The config data is a small portion, and the data is huge. There’s a problem with the idea of a data file being a cluster containing the config structure and the data. And that problem is memory size. To write a cluster to a datalog file, you have to have the data all in one place. If your data is stored in some other place as it’s acquired, then writing the file means making a COPY of your huge data and putting into the file cluster before writing the file. That’s wasteful. And back in the days when 8 Megabytes was all the RAM a machine could hold, and my clients needed to record more, it wasn’t even POSSIBLE.

To solve those issues, I invented a hybrid file. That term is my own label; it is not an official LabVIEW term. The idea is that you write the config data as a DATALOG file, and close it. Then you open the SAME file as a binary file , skip past the datalog portion, and write binary data. You get the benefit of both worlds: it’s easy to read / write the config header with ordinary datalog operations, and it’s easy to read the binary data with binary operations. You can write the data as you need to; you don’t need to make copies. You can write more data than you have RAM for. You just have to remember that the file doesn’t start at offset zero. It’s perfect, right?

Almost.

You have to figure out where the start of binary data should be, and that’s not trivial. LabVIEW’s DATALOG files include their own header, and the structure and size of that header is not public information. However, you can make some deductions. Since the FILE DIALOG can discriminate between datalog files of different structure, the structure format has to be embedded in the file itself. So what I do is flatten the datalog portion to a string and get it’s size. I take the TYPE STRING (which is not really a string) that comes out of the FLATTEN function, and flatten that and get another size. I add those two sizes together, and round it UP to the next highest multiple of 4096 or so. That works for any size structure, as it includes an estimate for the Datalog header, as well as our own header.

When you read the file, you do the same thing, and compute the offset where the binary data starts.

One more gotcha. Occasionally, National Instruments changes the format used in Datalog files. Usually it’s a minor change, and usually LabVIEW handles it automatically. You may have seen the message “This file was recorded using an older version of LabVIEW, it must be updated to be read. Do you want to update it?”. All that is well and good if it’s a plain Datalog file, but if it’s a hybrid file, LabVIEW doesn’t know anything about the binary data you’ve stuck on the end. So it will open the datalog portion, and re-write a new datalog portion, truncating everything after that, including your data. Beware.

Still this method brings more benefits to the table that it brings problems, so consider it for your own projects.

Enjoy.

Hybrid Data Files

Datalog + Binary = Hybrid

Leave a Reply

Contact Info

207-593-8109

Culverson Software

184 Lakeview Drive

Rockland, ME 04841

General questions:

Accounts/billing questions: