The few things to know about BIN/BINX files and their handling in R
by Sebastian Kreutzer (June 6, 2021)
I’ve realised over some time now that many users seem to have struggles to efficiently process BIN/BINX files with the package Luminescence. The function to import BIN/BINX files was around even before the package was released, and Luminescence became a community project. If there is any ambiguity, I feel somehow obliged to lift the fog.
Having said that, with this tutorial, I will try to shed a little bit of light on BIN/BINX file handling using Luminescence, hopefully making it a perhaps more joyful experience.
1 What is meant with BIN/BINX files?
When I talk about BIN/BINX files, I refer to files with the ending *.bin
or *.binx
mainly produced by the commercially available Risø luminescence readers. These files contain the measurement data (typically everything that the photomultiplier detects) produced by the TL/OSL readers of that company in a binary file format. Over the many years these machines have been around, the file format design changed slightly, leading to at least six different versions. Version 3 to version 8, all supported Luminescence. Perhaps while I am writing this tutorial, there is already a new version around, I am just not (yet) aware of. If so, please notify me.
The important thing to know about these different versions is that they are not really compatible. The part of the file that includes the metadata (we will talk about it later), differ in length and partly in byte order. I realised this first when I, proud about my first R functions, could not import any more new files after we had updated the system software of our reader. The good news is that the Risø guys have always been very supportive and shared the format documentation, which allowed me to provide timely and good format support in Luminescence. A few more details about the format can be found by typing ?`Risoe.BINfileData-class`
in the R terminal.
2 The structure of BIN/BINX files
The easiest way to import BIN/BINX files is to call the function read_BIN2R()
. The function will automatically determine the format version. The file name extension does not matter, and both endings *.bin
and *.binx
(rule of thumb: everything >= V4 has the ending *.binx
) are supported. So, the most straightforward code snippet reads:
library(Luminescence)
file <- "20101027_BT707_MAIN_CGQ.BIN"
bin_data <- read_BIN2R(file, txtProgressBar = FALSE)
##
## [read_BIN2R()]
## >> 20101027_BT707_MAIN_CGQ.BIN
## >> 792 records have been read successfully!
Where file
is a character to your BIN/BINX file. For this tutorial, I will use a dataset I have measured during my PhD. Measured was a quartz coarse grain sample from the loess section Seilitz in Saxony, Germany (Meszner et al. 2013). The parameter txtProgressBar = FALSE
suppresses the import progress bar shown in the terminal, something that is not of relevance here.
The output of the function is an R object called Risoe.BINfileData-class
. I cannot recall why
I decided to make the name so long. I guess bad habit. When the object is called, it prints a summary of the object instead of flooding the terminal with data.
bin_data
##
## [Risoe.BINfileData object]
##
## BIN/BINX version 3
## Object date: 271020, 281020, 291020
## User: Default
## System ID: 150
## Overall records: 792
## Records type: IRSL (n = 36)
## OSL (n = 504)
## TL (n = 252)
## Position range: 1 : 36
## Grain range: 0 : 0
## Run range: 1 : 8
## Set range: 3 : 6
We learn that the file (here version 3) was produced somewhat end of October 20XX (the format dates back to a time when it was obviously hard to imagine that we make the millenniums transition) by a user sensibly called Default
in a system with serial number 150. Further information shows the number of overall records (luminescence curves of a different type) and the number of assigned positions. Run and set range both refer to the measurement sequence design.
The object itself is something following the so-called S4 definition. Nothing of further relevance except for the magic operator to access elements (slots) of the object is the at @
symbol (?@
). Alternatively, you can try str(bin_data)
. Personally I found this function never really helpful,
in particular not for large objects.
2.1 @METADATA
Once imported, the object (here bin_data
) contains elements called slots. One is METADATA
, which is a data.frame
, and it includes all metadata of the measurements as some kind of big spreadsheet you may have already seen in the central window of the software Analyst (Duller 2015).
head(bin_data@METADATA)
ID | SEL | VERSION | LENGTH | PREVIOUS | NPOINTS | RECTYPE | RUN | SET | POSITION | GRAIN | GRAINNUMBER | CURVENO | XCOORD | YCOORD | SAMPLE | COMMENT | SYSTEMID | FNAME | USER | TIME | DATE | DTYPE | BL_TIME | BL_UNIT | NORM1 | NORM2 | NORM3 | BG | SHIFT | TAG | LTYPE | LIGHTSOURCE | LPOWER | LIGHTPOWER | LOW | HIGH | RATE | TEMPERATURE | MEASTEMP | AN_TEMP | AN_TIME | TOLDELAY | TOLON | TOLOFF | IRR_TIME | IRR_TYPE | IRR_UNIT | IRR_DOSERATE | IRR_DOSERATEERR | TIMESINCEIRR | TIMETICK | ONTIME | OFFTIME | STIMPERIOD | GATE_ENABLED | ENABLE_FLAGS | GATE_START | GATE_STOP | PTENABLED | DTENABLED | DEADTIME | MAXLPOWER | XRF_ACQTIME | XRF_HV | XRF_CURR | XRF_DEADTIMEF | DETECTOR_ID | LOWERFILTER_ID | UPPERFILTER_ID | ENOISEFACTOR | MARKPOS_X1 | MARKPOS_Y1 | MARKPOS_X2 | MARKPOS_Y2 | MARKPOS_X3 | MARKPOS_Y3 | EXTR_START | EXTR_END | SEQUENCE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | TRUE | 3 | 8272 | 0 | 2000 | 0 | 1 | 3 | 1 | 0 | 0 | NA | 0 | 0 | BT 707 CGQ | Natural | 150 | 20101027_BT707_MAIN_CGQ | Default | 13:39:24 | 271020 | Natural | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | OSL | Blue Diodes | 90 | 90 | 0 | 40 | 5 | 0 | NA | 125 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | 0 | 0 | NA | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 20101027 |
2 | TRUE | 3 | 8272 | 8272 | 2000 | 0 | 1 | 3 | 2 | 0 | 0 | NA | 0 | 0 | BT 707 CGQ | Natural | 150 | 20101027_BT707_MAIN_CGQ | Default | 13:41:09 | 271020 | Natural | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | OSL | Blue Diodes | 90 | 90 | 0 | 40 | 5 | 0 | NA | 125 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | 0 | 0 | NA | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 20101027 |
3 | TRUE | 3 | 8272 | 8272 | 2000 | 0 | 1 | 3 | 3 | 0 | 0 | NA | 0 | 0 | BT 707 CGQ | Natural | 150 | 20101027_BT707_MAIN_CGQ | Default | 13:42:56 | 271020 | Natural | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | OSL | Blue Diodes | 90 | 90 | 0 | 40 | 5 | 0 | NA | 125 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | 0 | 0 | NA | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 20101027 |
4 | TRUE | 3 | 8272 | 8272 | 2000 | 0 | 1 | 3 | 4 | 0 | 0 | NA | 0 | 0 | BT 707 CGQ | Natural | 150 | 20101027_BT707_MAIN_CGQ | Default | 13:44:43 | 271020 | Natural | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | OSL | Blue Diodes | 90 | 90 | 0 | 40 | 5 | 0 | NA | 125 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | 0 | 0 | NA | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 20101027 |
5 | TRUE | 3 | 8272 | 8272 | 2000 | 0 | 1 | 3 | 5 | 0 | 0 | NA | 0 | 0 | BT 707 CGQ | Natural | 150 | 20101027_BT707_MAIN_CGQ | Default | 13:46:29 | 271020 | Natural | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | OSL | Blue Diodes | 90 | 90 | 0 | 40 | 5 | 0 | NA | 125 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | 0 | 0 | NA | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 20101027 |
6 | TRUE | 3 | 8272 | 8272 | 2000 | 0 | 1 | 3 | 6 | 0 | 0 | NA | 0 | 0 | BT 707 CGQ | Natural | 150 | 20101027_BT707_MAIN_CGQ | Default | 13:48:15 | 271020 | Natural | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | OSL | Blue Diodes | 90 | 90 | 0 | 40 | 5 | 0 | NA | 125 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | 0 | 0 | NA | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | 0 | 20101027 |
2.2 @DATA
The second slot, DATA
, is of type list
. The element contains the actual measurement data in the order they were recorded. In R they are represented as numeric
vectors.
## show first 20 data points
## of the first record
bin_data@DATA[[1]][1:20]
## [1] 1057 952 876 798 835 730 726 646 650 553 526 562 507 443 398
## [16] 388 367 369 336 304
Every single row in METADATA
refers to one record in DATA
. The link between the two is the column ID
in METADATA
.
It is essential to understand that DATA
only contains count data of the measurement, which means y
values only. It looks odd, but the data are stored for memory efficiency reasons in the BIN/BINX files. The x
values (time or temperatures values) are calculated on the fly if needed using the METADATA
information.
3 File processing and curve selection
So, it appears that importing BIN/BINX files into R isn’t difficult after all because it is handled by the function read_BIN2R()
. Regardless of whether you know how data are stored
in the BIN/BINX file.
Unfortunately, usually, the import is only the first step. If you want to quickly select some of the relevant curves, plot or do other things; here is a list of some useful functions:
FUNCTION | PURPOSE |
---|---|
read_BIN2R() |
Import BIN/BINX files into R |
write_R2BIN() |
Write content previously imported again back into a BIN/BINX file |
convert_BIN2CSV() |
Convert BIN/BINX files to CSV files to be processed with other software |
merge_Risoe.BINfileData() |
Merges BIN/BINX files or such objects previously imported with read_BIN2R() |
subset() |
Sub setting (extracting) parts of the data from the BIN/BINX file |
plot_Risoe.BINfileData() |
Plots the records in the file |
Risoe.BINfileData2RLum.Analysis() |
Converts the Risoe.BINfileData to RLum.Data.Curve and RLum.Analysis objects |
3.1 Import and export
3.1.1 Import: read_BIN2R()
Importing (read) and exporting (write) data is an obvious task, but what else has the function to offer?
args(read_BIN2R)
## function (file, show.raw.values = FALSE, position = NULL, n.records = NULL,
## zero_data.rm = TRUE, duplicated.rm = FALSE, fastForward = FALSE,
## show.record.number = FALSE, txtProgressBar = TRUE, forced.VersionNumber = NULL,
## ignore.RECTYPE = FALSE, pattern = NULL, verbose = TRUE, ...)
## NULL
First, there are a few technical parameters, such as txtProgressBar
, verbose
and show.record.number
. Except if you are planning on writing a tutorial, you usually do not need these parameters because all will only change what is shown in the R terminal during import. Their setting do not alter the data import.
show.raw.values
, forced.VersionNumber
, and ignore.RECTYPE
offer some kind of debugging functionality and error handling without accessing the underlying code. I am not sure whether anybody ever used these features.
zero_data.rm
and duplicated.rm
are very useful if something went wrong during the measurement because they clear the import from all broken or duplicated (it appears to happen a lot during
single grain measurements).
pattern
, n.records
, pattern
are more interesting. Let’s start with pattern
. Like many functions in 'Luminescence'
, read_BIN2R()
is designed to iterate automatically over large
datasets. If you now provide only a path (for example, to a folder with many BIN/BINX files)
in file
, the first argument, pattern
would take a character or a regular expression (?regex
) to select only files with file names matching the pattern. For example,
read_BIN2R(file = "/myBIN_file_folder/", pattern = "Aberystwyth")
would only import BIN/BINX files where it finds the word “Aberystwyth.”
The arguments position
and n.records
allow you to limit the import to a particular position range or a number of records.
## import only records from position 1
read_BIN2R(file, position = 1)
## import only the first 100 records (regardless the position number)
read_BIN2R(file, n.records = 100)
Side note: Unlike the selection of records, the selection of only one position will not speed up the import of the file because until all records are imported, the function does not know whether a position comes up again or not.
3.1.2 Export: write_R2BIN()
The function write_R2BIN()
works very similar but with fewer arguments. Most important is the option version
. This allows you, for instance, to import a file of version 3 and export it again in version 8 to be compatible with other software.
## import BIN-file version 3
V3 <- read_BIN2R(file, verbose = FALSE)
## export to version 8, here a temporary file
write_R2BIN(V3, tempfile(), version = "8", txtProgressBar = FALSE)
3.1.3 Export as CSV convert_BIN2CSV()
Sometimes R simply isn’t the tool you want to or can’t use. Our a colleague would ask you, “Can you please mail to records as CSV?” The reasons are manifold, luckily R isn’t a closed environment, and the easiest way to exchange curve data is to do so as CSV files because basically, every software can work with these files.
The only tricky part with BIN/BINX files is that we are missing the x-axis data,
however, the function convert_BIN2CSV()
does the calculation.
output_path <- tempdir()
convert_BIN2CSV(file, path = output_path, verbose = FALSE)
head(list.files(output_path))
## [1] "[[1]]_1_OSL.csv" "[[1]]_10_OSL.csv" "[[1]]_11_TL.csv" "[[1]]_12_OSL.csv"
## [5] "[[1]]_13_OSL.csv" "[[1]]_14_TL.csv"
3.2 Merging files
The idea of merging BIN/BINX files is probably self-explanatory. You may have split your measurements into different files on purpose, or you want to combine measurements that had
stopped in the middle, and because of it, you had to re-run the sequence and ended up with multiple files. The function of merge_Risoe.BINfileData()
takes either file names (or path to files) or object names of files already imported via read_BIN2R()
. We can try this with our BIN/BINX file we have imported a few lines above (the object called bin_data
).
merge_Risoe.BINfileData(c(bin_data, bin_data))
##
## [Risoe.BINfileData object]
##
## BIN/BINX version 3
## Object date: 271020, 281020, 291020
## User: Default
## System ID: 150
## Overall records: 1584
## Records type: IRSL (n = 72)
## OSL (n = 1008)
## TL (n = 504)
## Position range: 1 : 72
## Grain range: 0 : 0
## Run range: 1 : 8
## Set range: 3 : 6
The output is another time a Risoe.BINfileData-class
object, with a crucial difference: Now, the position number runs from 1 to 72(!). Obviously, if there were a Risø device with a carousel with so many aliquot positions, it is not commercially available. The reason for this recalculation of position is that data analysis is usually carried out based on position numbers.
But if we append the new data without taking care of the position numbers,
position numbers appear twice (or multiple times).
Such behaviour might be wanted, for instance, if the reason for merging BIN/BINX files was a broken measurement. More likely, however, is that you have measured one sample over, let’s say, two carousels (\(2\times48\) positions), simply because you wanted to increase the number of aliquots. In that case, you want to treat each of the positions unique because they represent individual aliquots, and the merge function takes care of it.
You can control this behaviour by setting the parameter keep.position.number
to either FALSE
(the default) or TRUE
. Additionally, you may have used only every
2nd or 3rd position on your sample carousel of the reader. It would not matter for any subsequent analysis but you may want to preserve that information. For this purpose, you can use the argument position.number.append.gap
.
3.3 Subsetting of records
Subsetting or selecting particular records from BIN/BINX file is probably the most complicated part. I have seen a couple of times that users first did this with the Analyst before importing the file into R. Well, there is no need for it. One way of selecting data we have already described in Fuchs et al. (2015).
ID <- bin_data@METADATA[bin_data@METADATA$RUN == 1,"ID"]
This call would give you the record identifiers of all records with the attribute run = 1. From there, we could move on to only what we need (see Fuchs et al. 2015 for more details).
When the paper was written (not when it was published), this was the way to select records. It was a little bit cumbersome and not really clean, and the way to go later, was to use
the RLum
objects instead (briefly below).
Because I personally stopped working with BIN/BINX files regularly, and if I would use RLum
objects, it took me a while before I realised that people were still trying to select records that way. I could see the beauty. The big table is why selecting records according to their metadata in the Analyst is easy. So why not making it as easy as in the Analyst? The function here is called subset()
. This function is around even without the 'Luminescence'
package to subset data.frame
s in R. For the 'Luminescence'
package, I added a new method to this function to work with the Risoe.BINfileData
objects. In other words, the function subset()
works like you expect it from working if you are familiar with base R and handling data.frame
s. Only here it takes care of the peculiarities of the
Risoe.BINfileData
objects.
subset(bin_data, RUN == 1)
##
## [Risoe.BINfileData object]
##
## BIN/BINX version 3
## Object date: 271020
## User: Default
## System ID: 150
## Overall records: 108
## Records type: OSL (n = 72)
## TL (n = 36)
## Position range: 1 : 36
## Grain range: 0 : 0
## Run range: 1 : 1
## Set range: 3 : 6
This is essentially the same selection we did a few lines above. The only difference is that the output is again a Risoe.BINfileData
. And it can be done more sophisticated. For example, we
could select only records of 10 to 20 with run number > 2.
subset(bin_data, POSITION >= 10 & POSITION <= 30 & RUN > 2)
##
## [Risoe.BINfileData object]
##
## BIN/BINX version 3
## Object date: 281020, 291020
## User: Default
## System ID: 150
## Overall records: 336
## Records type: IRSL (n = 21)
## OSL (n = 210)
## TL (n = 105)
## Position range: 10 : 30
## Grain range: 0 : 0
## Run range: 3 : 8
## Set range: 3 : 6
This is a quick way of selecting the right curves needed for the analysis. Supported fields are all(!) column names of the METADATA
slot (colnames(bin_data@METADATA)
).
3.4 Changing the metadata of a record
Sometimes it is necessary to correct the data before we can process them further. A typical example would be the ltype
, the type of luminescence. For instance, our imported dataset has three different curve types.
unique(bin_data@METADATA$LTYPE)
## [1] "OSL" "TL" "IRSL"
Let’s assume we want to treat the IRSL curves as OSL curves, and therefore we have to rename them first. We can modify them to replace all relevant entries by using base R functionality.
new_bin_data <- bin_data
new_bin_data@METADATA[new_bin_data@METADATA$LTYP == "IRSL", "LTYPE"] <- "OSL"
3.5 Changing records
Changing count values in the record works likewise. For example, to replace all values in record number 5 with the noise we could write:
## set plot panel showing 1 row and 2 columns
par(mfrow = c(1,2))
## plot records as it appears before the replacement
plot(new_bin_data@DATA[[5]], main = "before")
## replace values in record
new_bin_data@DATA[[5]] <- runif(length(new_bin_data@DATA[[5]]))
## plot record after the replacement
plot(new_bin_data@DATA[[5]], main = "after")
3.6 Plotting
It is always a good idea to look at your data before processing them. Of course, we may use standard R functions, such as plot()
as shown above. Still, to have all the information about the curve type and the correct axes labelling at hand, it is easier to automatically use a function that does all of it.
## set plot panel (three rows, eight columns)
par(mfrow = c(3,8))
## plot dataset
plot_Risoe.BINfileData(bin_data, position = 1)
This plotting function does not do much but comes with some handy arguments such as sorter
, set to POSITION
by default. But it can be set to any other argument to see curves in a different order. Equally interesting might be the option curve.transformation
, which converts CW-OSL and CW-IRSL curves into pseudo-LM curves after suggested by Bos and Wallinga (2012).
par(mfrow = c(1,3))
plot_Risoe.BINfileData(
bin_data,
position = 1,
run = 2,
curve.transformation = "CW2pLMi"
)
The transformation happens on the fly, and of course, the function only transposes curves where such a transformation makes sense. In our case, the TL curve remains untouched.
4 What about RLum-objects?
Last but not least, there is an important aspect I have not mentioned yet: all analysis and calculation functions do not work with Risoe.BINfileData
objects, but with something called RLum
objects. Detailing the background and purpose of the RLum
objects is a tutorial of its own. Here let’s just say that it is easier to work with a unified RLum
object structure. Because BIN/BINX files are not the only files that can be processed with 'Luminescence'
and every file format is different. Perhaps the last thing you want to do as a user is overthinking file format differences.
Suppose, we have now selected (subset) all curves of interest and want to further work with the data. This means the final import step requires that the Risoe.BINfileData
is converted into so-called RLum
objects.
data_rlum <- Risoe.BINfileData2RLum.Analysis(bin_data)
The function allows you to set a couple of options, for instance, setting the position number (argument pos
). This somewhat duplicates the functionality of subset()
. However, subset()
came later, so this function’s arguments are mainly leftovers to maintain backward compatibility and is not so powerful.
If you don’t want to work with the Risoe.BINfileData
objects at all, you can import your file using the argument fastForward
, which does the rest for you.
data_rlum <- read_BIN2R(file, fastForward = TRUE, verbose = FALSE)
data_rlum[[1]]
##
## [RLum.Analysis-class]
## originator: Risoe.BINfileData2RLum.Analysis()
## protocol: unknown
## additional info elements: 0
## number of records: 22
## .. : RLum.Data.Curve : 22
## .. .. : #1 OSL | #2 TL | #3 OSL | #4 OSL | #5 TL | #6 OSL | #7 OSL
## .. .. : #8 TL | #9 OSL | #10 OSL | #11 TL | #12 OSL | #13 OSL | #14 TL
## .. .. : #15 OSL | #16 OSL | #17 TL | #18 OSL | #19 OSL | #20 TL | #21 OSL
## .. .. : #22 IRSL
Either way, the result is the same, and you end up in a completely different world, the world of RLum
objects. Ready to be used with a lot of functions
in the R package 'Luminescence'
. Why we need RLum
objects and how we can efficiently process them. Well, this is stuff for another tutorial.
5 Some final remarks
I have heard a few times (indeed, only a few times) that it would be nice if read_BIN2R()
and write_BIN2R()
were to work faster. Well, indeed and my two thoughts to it: (1) If you buy a faster computer, also the import will be faster, (2) writing the two functions in C/C++ would bring a tremendous speed boost. The only problem with it is that it might not work equally nice on all platforms without much effort. You can run R and 'Luminescence'
on Windows, Linux or macOS and still import your BIN/BINX files in the same way. I believe that this advantage outweighs the slower import speed.
Last, if you feel something is missing in this tutorial, please write me an email.