## The few things to know about BIN/BINX files and their handling in R

###### by Sebastian Kreutzer (June 6, 2021)

Luminescence >= 0.9.0

Creative Commons

I’ve realised over some time now that many users seem to have struggles to efficiently process BIN/BINX files with the package Luminescence. The function to import BIN/BINX files was around even before the package was released, and Luminescence became a community project. If there is any ambiguity, I feel somehow obliged to lift the fog.

Having said that, with this tutorial, I will try to shed a little bit of light on BIN/BINX file handling using Luminescence, hopefully making it a perhaps more joyful experience.

# 1 What is meant with BIN/BINX files?

When I talk about BIN/BINX files, I refer to files with the ending *.bin or *.binx mainly produced by the commercially available Risø luminescence readers. These files contain the measurement data (typically everything that the photomultiplier detects) produced by the TL/OSL readers of that company in a binary file format. Over the many years these machines have been around, the file format design changed slightly, leading to at least six different versions. Version 3 to version 8, all supported Luminescence. Perhaps while I am writing this tutorial, there is already a new version around, I am just not (yet) aware of. If so, please notify me.

The important thing to know about these different versions is that they are not really compatible. The part of the file that includes the metadata (we will talk about it later), differ in length and partly in byte order. I realised this first when I, proud about my first R functions, could not import any more new files after we had updated the system software of our reader. The good news is that the Risø guys have always been very supportive and shared the format documentation, which allowed me to provide timely and good format support in Luminescence. A few more details about the format can be found by typing ?Risoe.BINfileData-class in the R terminal.

# 2 The structure of BIN/BINX files

The easiest way to import BIN/BINX files is to call the function read_BIN2R(). The function will automatically determine the format version. The file name extension does not matter, and both endings *.bin and *.binx (rule of thumb: everything >= V4 has the ending *.binx) are supported. So, the most straightforward code snippet reads:

library(Luminescence)
file <- "20101027_BT707_MAIN_CGQ.BIN"
bin_data <- read_BIN2R(file, txtProgressBar = FALSE)
##
##   >> 20101027_BT707_MAIN_CGQ.BIN
##   >> 792 records have been read successfully!

Where file is a character to your BIN/BINX file. For this tutorial, I will use a dataset I have measured during my PhD. Measured was a quartz coarse grain sample from the loess section Seilitz in Saxony, Germany . The parameter txtProgressBar = FALSE suppresses the import progress bar shown in the terminal, something that is not of relevance here.

The output of the function is an R object called Risoe.BINfileData-class. I cannot recall why I decided to make the name so long. I guess bad habit. When the object is called, it prints a summary of the object instead of flooding the terminal with data.

bin_data
##
## [Risoe.BINfileData object]
##
##  BIN/BINX version      3
##  Object date:          271020, 281020, 291020
##  User:                 Default
##  System ID:            150
##  Overall records:      792
##  Records type:         IRSL  (n = 36)
##                        OSL   (n = 504)
##                        TL    (n = 252)
##  Position range:       1 : 36
##  Grain range:          0 : 0
##  Run range:            1 : 8
##  Set range:            3 : 6

We learn that the file (here version 3) was produced somewhat end of October 20XX (the format dates back to a time when it was obviously hard to imagine that we make the millenniums transition) by a user sensibly called Default in a system with serial number 150. Further information shows the number of overall records (luminescence curves of a different type) and the number of assigned positions. Run and set range both refer to the measurement sequence design.

The object itself is something following the so-called S4 definition. Nothing of further relevance except for the magic operator to access elements (slots) of the object is the at @ symbol (?@). Alternatively, you can try str(bin_data). Personally I found this function never really helpful, in particular not for large objects.

## 2.1@METADATA

Once imported, the object (here bin_data) contains elements called slots. One is METADATA, which is a data.frame, and it includes all metadata of the measurements as some kind of big spreadsheet you may have already seen in the central window of the software Analyst .

head(bin_data@METADATA)
ID SEL VERSION LENGTH PREVIOUS NPOINTS RECTYPE RUN SET POSITION GRAIN GRAINNUMBER CURVENO XCOORD YCOORD SAMPLE COMMENT SYSTEMID FNAME USER TIME DATE DTYPE BL_TIME BL_UNIT NORM1 NORM2 NORM3 BG SHIFT TAG LTYPE LIGHTSOURCE LPOWER LIGHTPOWER LOW HIGH RATE TEMPERATURE MEASTEMP AN_TEMP AN_TIME TOLDELAY TOLON TOLOFF IRR_TIME IRR_TYPE IRR_UNIT IRR_DOSERATE IRR_DOSERATEERR TIMESINCEIRR TIMETICK ONTIME OFFTIME STIMPERIOD GATE_ENABLED ENABLE_FLAGS GATE_START GATE_STOP PTENABLED DTENABLED DEADTIME MAXLPOWER XRF_ACQTIME XRF_HV XRF_CURR XRF_DEADTIMEF DETECTOR_ID LOWERFILTER_ID UPPERFILTER_ID ENOISEFACTOR MARKPOS_X1 MARKPOS_Y1 MARKPOS_X2 MARKPOS_Y2 MARKPOS_X3 MARKPOS_Y3 EXTR_START EXTR_END SEQUENCE
1 TRUE 3 8272 0 2000 0 1 3 1 0 0 NA 0 0 BT 707 CGQ Natural 150 20101027_BT707_MAIN_CGQ Default 13:39:24 271020 Natural 0 0 0 0 0 0 0 1 OSL Blue Diodes 90 90 0 40 5 0 NA 125 10 0 0 0 0 0 0 NA NA NA NA 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 20101027
2 TRUE 3 8272 8272 2000 0 1 3 2 0 0 NA 0 0 BT 707 CGQ Natural 150 20101027_BT707_MAIN_CGQ Default 13:41:09 271020 Natural 0 0 0 0 0 0 0 1 OSL Blue Diodes 90 90 0 40 5 0 NA 125 10 0 0 0 0 0 0 NA NA NA NA 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 20101027
3 TRUE 3 8272 8272 2000 0 1 3 3 0 0 NA 0 0 BT 707 CGQ Natural 150 20101027_BT707_MAIN_CGQ Default 13:42:56 271020 Natural 0 0 0 0 0 0 0 1 OSL Blue Diodes 90 90 0 40 5 0 NA 125 10 0 0 0 0 0 0 NA NA NA NA 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 20101027
4 TRUE 3 8272 8272 2000 0 1 3 4 0 0 NA 0 0 BT 707 CGQ Natural 150 20101027_BT707_MAIN_CGQ Default 13:44:43 271020 Natural 0 0 0 0 0 0 0 1 OSL Blue Diodes 90 90 0 40 5 0 NA 125 10 0 0 0 0 0 0 NA NA NA NA 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 20101027
5 TRUE 3 8272 8272 2000 0 1 3 5 0 0 NA 0 0 BT 707 CGQ Natural 150 20101027_BT707_MAIN_CGQ Default 13:46:29 271020 Natural 0 0 0 0 0 0 0 1 OSL Blue Diodes 90 90 0 40 5 0 NA 125 10 0 0 0 0 0 0 NA NA NA NA 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 20101027
6 TRUE 3 8272 8272 2000 0 1 3 6 0 0 NA 0 0 BT 707 CGQ Natural 150 20101027_BT707_MAIN_CGQ Default 13:48:15 271020 Natural 0 0 0 0 0 0 0 1 OSL Blue Diodes 90 90 0 40 5 0 NA 125 10 0 0 0 0 0 0 NA NA NA NA 0 0 NA 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0 0 20101027

## 2.2@DATA

The second slot, DATA, is of type list. The element contains the actual measurement data in the order they were recorded. In R they are represented as numeric vectors.

## show first 20 data points
## of the first record
bin_data@DATA[[1]][1:20]
##  [1] 1057  952  876  798  835  730  726  646  650  553  526  562  507  443  398
## [16]  388  367  369  336  304

Every single row in METADATA refers to one record in DATA. The link between the two is the column ID in METADATA.

It is essential to understand that DATA only contains count data of the measurement, which means y values only. It looks odd, but the data are stored for memory efficiency reasons in the BIN/BINX files. The x values (time or temperatures values) are calculated on the fly if needed using the METADATA information.

# 3 File processing and curve selection

So, it appears that importing BIN/BINX files into R isn’t difficult after all because it is handled by the function read_BIN2R(). Regardless of whether you know how data are stored in the BIN/BINX file.

Unfortunately, usually, the import is only the first step. If you want to quickly select some of the relevant curves, plot or do other things; here is a list of some useful functions:

FUNCTION PURPOSE
read_BIN2R() Import BIN/BINX files into R
write_R2BIN() Write content previously imported again back into a BIN/BINX file
convert_BIN2CSV() Convert BIN/BINX files to CSV files to be processed with other software
merge_Risoe.BINfileData() Merges BIN/BINX files or such objects previously imported with read_BIN2R()
subset() Sub setting (extracting) parts of the data from the BIN/BINX file
plot_Risoe.BINfileData() Plots the records in the file
Risoe.BINfileData2RLum.Analysis() Converts the Risoe.BINfileData to RLum.Data.Curve and RLum.Analysis objects

## 3.1 Import and export

### 3.1.1 Import: read_BIN2R()

Importing (read) and exporting (write) data is an obvious task, but what else has the function to offer?

args(read_BIN2R)
## function (file, show.raw.values = FALSE, position = NULL, n.records = NULL,
##     zero_data.rm = TRUE, duplicated.rm = FALSE, fastForward = FALSE,
##     show.record.number = FALSE, txtProgressBar = TRUE, forced.VersionNumber = NULL,
##     ignore.RECTYPE = FALSE, pattern = NULL, verbose = TRUE, ...)
## NULL

First, there are a few technical parameters, such as txtProgressBar, verbose and show.record.number. Except if you are planning on writing a tutorial, you usually do not need these parameters because all will only change what is shown in the R terminal during import. Their setting do not alter the data import.

show.raw.values, forced.VersionNumber, and ignore.RECTYPE offer some kind of debugging functionality and error handling without accessing the underlying code. I am not sure whether anybody ever used these features.

zero_data.rm and duplicated.rm are very useful if something went wrong during the measurement because they clear the import from all broken or duplicated (it appears to happen a lot during single grain measurements).

pattern, n.records, pattern are more interesting. Let’s start with pattern. Like many functions in 'Luminescence', read_BIN2R() is designed to iterate automatically over large datasets. If you now provide only a path (for example, to a folder with many BIN/BINX files) in file, the first argument, pattern would take a character or a regular expression (?regex) to select only files with file names matching the pattern. For example,

read_BIN2R(file = "/myBIN_file_folder/", pattern = "Aberystwyth")

would only import BIN/BINX files where it finds the word “Aberystwyth.”

The arguments position and n.records allow you to limit the import to a particular position range or a number of records.

## import only records from position 1

## import only the first 100 records (regardless the position number)
read_BIN2R(file, n.records = 100)

Side note: Unlike the selection of records, the selection of only one position will not speed up the import of the file because until all records are imported, the function does not know whether a position comes up again or not.

### 3.1.2 Export: write_R2BIN()

The function write_R2BIN() works very similar but with fewer arguments. Most important is the option version. This allows you, for instance, to import a file of version 3 and export it again in version 8 to be compatible with other software.

## import BIN-file version 3
V3 <- read_BIN2R(file, verbose = FALSE)

## export to version 8, here a temporary file
write_R2BIN(V3, tempfile(), version = "8", txtProgressBar = FALSE)

### 3.1.3 Export as CSV convert_BIN2CSV()

Sometimes R simply isn’t the tool you want to or can’t use. Our a colleague would ask you, “Can you please mail to records as CSV?” The reasons are manifold, luckily R isn’t a closed environment, and the easiest way to exchange curve data is to do so as CSV files because basically, every software can work with these files.

The only tricky part with BIN/BINX files is that we are missing the x-axis data, however, the function convert_BIN2CSV() does the calculation.

output_path <- tempdir()
convert_BIN2CSV(file, path = output_path, verbose = FALSE)
head(list.files(output_path))
## [1] "[[1]]_1_OSL.csv"  "[[1]]_10_OSL.csv" "[[1]]_11_TL.csv"  "[[1]]_12_OSL.csv"
## [5] "[[1]]_13_OSL.csv" "[[1]]_14_TL.csv"

## 3.2 Merging files

The idea of merging BIN/BINX files is probably self-explanatory. You may have split your measurements into different files on purpose, or you want to combine measurements that had stopped in the middle, and because of it, you had to re-run the sequence and ended up with multiple files. The function of merge_Risoe.BINfileData() takes either file names (or path to files) or object names of files already imported via read_BIN2R(). We can try this with our BIN/BINX file we have imported a few lines above (the object called bin_data).

merge_Risoe.BINfileData(c(bin_data, bin_data))
##
## [Risoe.BINfileData object]
##
##  BIN/BINX version      3
##  Object date:          271020, 281020, 291020
##  User:                 Default
##  System ID:            150
##  Overall records:      1584
##  Records type:         IRSL  (n = 72)
##                        OSL   (n = 1008)
##                        TL    (n = 504)
##  Position range:       1 : 72
##  Grain range:          0 : 0
##  Run range:            1 : 8
##  Set range:            3 : 6

The output is another time a Risoe.BINfileData-class object, with a crucial difference: Now, the position number runs from 1 to 72(!). Obviously, if there were a Risø device with a carousel with so many aliquot positions, it is not commercially available. The reason for this recalculation of position is that data analysis is usually carried out based on position numbers. But if we append the new data without taking care of the position numbers, position numbers appear twice (or multiple times).

Such behaviour might be wanted, for instance, if the reason for merging BIN/BINX files was a broken measurement. More likely, however, is that you have measured one sample over, let’s say, two carousels ($$2\times48$$ positions), simply because you wanted to increase the number of aliquots. In that case, you want to treat each of the positions unique because they represent individual aliquots, and the merge function takes care of it.

You can control this behaviour by setting the parameter keep.position.number to either FALSE (the default) or TRUE. Additionally, you may have used only every 2nd or 3rd position on your sample carousel of the reader. It would not matter for any subsequent analysis but you may want to preserve that information. For this purpose, you can use the argument position.number.append.gap.

## 3.3 Subsetting of records

Subsetting or selecting particular records from BIN/BINX file is probably the most complicated part. I have seen a couple of times that users first did this with the Analyst before importing the file into R. Well, there is no need for it. One way of selecting data we have already described in .

ID <- bin_data@METADATA[bin_data@METADATA$RUN == 1,"ID"] This call would give you the record identifiers of all records with the attribute run = 1. From there, we could move on to only what we need (see Fuchs et al. 2015 for more details). When the paper was written (not when it was published), this was the way to select records. It was a little bit cumbersome and not really clean, and the way to go later, was to use the RLum objects instead (briefly below). Because I personally stopped working with BIN/BINX files regularly, and if I would use RLum objects, it took me a while before I realised that people were still trying to select records that way. I could see the beauty. The big table is why selecting records according to their metadata in the Analyst is easy. So why not making it as easy as in the Analyst? The function here is called subset(). This function is around even without the 'Luminescence' package to subset data.frames in R. For the 'Luminescence' package, I added a new method to this function to work with the Risoe.BINfileData objects. In other words, the function subset() works like you expect it from working if you are familiar with base R and handling data.frames. Only here it takes care of the peculiarities of the Risoe.BINfileData objects. subset(bin_data, RUN == 1) ## ## [Risoe.BINfileData object] ## ## BIN/BINX version 3 ## Object date: 271020 ## User: Default ## System ID: 150 ## Overall records: 108 ## Records type: OSL (n = 72) ## TL (n = 36) ## Position range: 1 : 36 ## Grain range: 0 : 0 ## Run range: 1 : 1 ## Set range: 3 : 6 This is essentially the same selection we did a few lines above. The only difference is that the output is again a Risoe.BINfileData. And it can be done more sophisticated. For example, we could select only records of 10 to 20 with run number > 2. subset(bin_data, POSITION >= 10 & POSITION <= 30 & RUN > 2) ## ## [Risoe.BINfileData object] ## ## BIN/BINX version 3 ## Object date: 281020, 291020 ## User: Default ## System ID: 150 ## Overall records: 336 ## Records type: IRSL (n = 21) ## OSL (n = 210) ## TL (n = 105) ## Position range: 10 : 30 ## Grain range: 0 : 0 ## Run range: 3 : 8 ## Set range: 3 : 6 This is a quick way of selecting the right curves needed for the analysis. Supported fields are all(!) column names of the METADATA slot (colnames(bin_data@METADATA)). ## 3.4 Changing the metadata of a record Sometimes it is necessary to correct the data before we can process them further. A typical example would be the ltype, the type of luminescence. For instance, our imported dataset has three different curve types. unique(bin_data@METADATA$LTYPE)
## [1] "OSL"  "TL"   "IRSL"

Let’s assume we want to treat the IRSL curves as OSL curves, and therefore we have to rename them first. We can modify them to replace all relevant entries by using base R functionality.

new_bin_data <- bin_data
new_bin_data@METADATA[new_bin_data@METADATA\$LTYP == "IRSL", "LTYPE"] <- "OSL"

## 3.5 Changing records

Changing count values in the record works likewise. For example, to replace all values in record number 5 with the noise we could write:

## set plot panel showing 1 row and 2 columns
par(mfrow = c(1,2))

## plot records as it appears before the replacement
plot(new_bin_data@DATA[[5]], main = "before")

## replace values in record
new_bin_data@DATA[[5]] <- runif(length(new_bin_data@DATA[[5]]))

## plot record after the replacement
plot(new_bin_data@DATA[[5]], main = "after")

## 3.6 Plotting

It is always a good idea to look at your data before processing them. Of course, we may use standard R functions, such as plot() as shown above. Still, to have all the information about the curve type and the correct axes labelling at hand, it is easier to automatically use a function that does all of it.

## set plot panel (three rows, eight columns)
par(mfrow = c(3,8))

## plot dataset
plot_Risoe.BINfileData(bin_data, position = 1)

This plotting function does not do much but comes with some handy arguments such as sorter, set to POSITION by default. But it can be set to any other argument to see curves in a different order. Equally interesting might be the option curve.transformation, which converts CW-OSL and CW-IRSL curves into pseudo-LM curves after suggested by .

par(mfrow = c(1,3))
plot_Risoe.BINfileData(
bin_data,
position = 1,
run = 2,
curve.transformation = "CW2pLMi"
)

The transformation happens on the fly, and of course, the function only transposes curves where such a transformation makes sense. In our case, the TL curve remains untouched.

Last but not least, there is an important aspect I have not mentioned yet: all analysis and calculation functions do not work with Risoe.BINfileData objects, but with something called RLum objects. Detailing the background and purpose of the RLum objects is a tutorial of its own. Here let’s just say that it is easier to work with a unified RLum object structure. Because BIN/BINX files are not the only files that can be processed with 'Luminescence' and every file format is different. Perhaps the last thing you want to do as a user is overthinking file format differences.

Suppose, we have now selected (subset) all curves of interest and want to further work with the data. This means the final import step requires that the Risoe.BINfileData is converted into so-called RLum objects.

data_rlum <- Risoe.BINfileData2RLum.Analysis(bin_data)

The function allows you to set a couple of options, for instance, setting the position number (argument pos). This somewhat duplicates the functionality of subset(). However, subset() came later, so this function’s arguments are mainly leftovers to maintain backward compatibility and is not so powerful.

If you don’t want to work with the Risoe.BINfileData objects at all, you can import your file using the argument fastForward, which does the rest for you.

data_rlum <- read_BIN2R(file, fastForward = TRUE, verbose = FALSE)
data_rlum[[1]]
##
##  [RLum.Analysis-class]
##   originator: Risoe.BINfileData2RLum.Analysis()
##   protocol: unknown
##   number of records: 22
##   .. : RLum.Data.Curve : 22
##   .. .. : #1 OSL | #2 TL | #3 OSL | #4 OSL | #5 TL | #6 OSL | #7 OSL
##   .. .. : #8 TL | #9 OSL | #10 OSL | #11 TL | #12 OSL | #13 OSL | #14 TL
##   .. .. : #15 OSL | #16 OSL | #17 TL | #18 OSL | #19 OSL | #20 TL | #21 OSL
##   .. .. : #22 IRSL

Either way, the result is the same, and you end up in a completely different world, the world of RLum objects. Ready to be used with a lot of functions in the R package 'Luminescence'. Why we need RLum objects and how we can efficiently process them. Well, this is stuff for another tutorial.

# 5 Some final remarks

I have heard a few times (indeed, only a few times) that it would be nice if read_BIN2R() and write_BIN2R() were to work faster. Well, indeed and my two thoughts to it: (1) If you buy a faster computer, also the import will be faster, (2) writing the two functions in C/C++ would bring a tremendous speed boost. The only problem with it is that it might not work equally nice on all platforms without much effort. You can run R and 'Luminescence' on Windows, Linux or macOS and still import your BIN/BINX files in the same way. I believe that this advantage outweighs the slower import speed.

Last, if you feel something is missing in this tutorial, please write me an email.

# References

Bos, Adrie J J, and Jakob Wallinga. 2012. “How to Visualize Quartz OSL Signal Components.” Radiation Measurements 47 (9): 752–58. https://doi.org/10.1016/j.radmeas.2012.01.013.
Duller, G A T. 2015. “The Analyst Software Package for Luminescence Data: Overview and Recent Improvements.” Edited by Regina DeWitt. Ancient TL 33 (1): 35–42.
Fuchs, Margret C, Sebastian Kreutzer, Christoph Burow, Michael Dietze, Manfred Fischer, Christoph Schmidt, and Markus Fuchs. 2015. “Data Processing in Luminescence Dating Analysis: An Exemplary Workflow Using the R Package Luminescence.” Quaternary International 362: 8–13. https://doi.org/10.1016/j.quaint.2014.06.034.
Meszner, Sascha, Sebastian Kreutzer, Markus Fuchs, and Dominik Faust. 2013. “Late Pleistocene Landscape Dynamics in Saxony, Germany: Paleoenvironmental Reconstruction Using Loess-Paleosol Sequences.” Quaternary International 296 (May): 95–107. https://doi.org/10.1016/j.quaint.2012.12.040.