View Source

2.1 Overview

The general data workflow outlines how biomolecular and archaeological data are managed from their generation through various usage and up to structured archival. Figure 1# below visualises the core progression, showing data infrastructure, in this case based on SharePoint worksheets and ARHUT data management system and emphasising feedback loops in data management:

Bioarchaeological Computational Manual > General Data Workflow > image-2025-6-25_14-27-13.png (Figure 1. Data workflow diagram)

Figure 1.

2.2 Sampling and Initial Documentation

Sampling is based on research design and must follow consistent documentation practices. Each sample is given a unique identifier following local laboratory principles of sample labelling, with metadata covering object/artefact type, its collection number, excavation context, coordinates, sample collection date, sampler, and analysis method planned. Documentation begins in field or lab-books and is later transcribed into cloud-based worksheets (e.g., SharePoint, Google Docs).

References

Niven, K., Jakobsson, U. Databases and spreadsheets: A guide to good practice https://zenodo.org/records/7740647

MINAS: (DNA) http://www.mixs-minas.org/

Stable isotopes: https://doi.org/10.1016/j.quaint.2022.02.027;

Roberts P, Fernandes R, Craig OE, Larsen T, Lucquin A, Swift J, Zech J. Calling all archaeologists: guidelines for terminology, methodology, data handling, and reporting when undertaking and reviewing stable isotope applications in archaeology. Rapid Commun Mass Spectrom. 2018 Mar 15;32(5):361-372. doi: 10.1002/rcm.8044. PMID: 29235694; PMCID: PMC5838555.

Reiter, Samantha S., Staniuk, Robert, Kolář, Jan, Bulatović, Jelena, Rose, Helene Agerskov, Ryabogina, Natalia E., Speciale, Claudia, Schjerven, Nicoline, Paulsson, Bettina Schulz, Lee, Victor Yan Kin, Canteri, Elisabetta, Revill, Alice, Dahlberg, Fredrik, Sabatini, Serena, Frei, Karin M., Racimo, Fernando, Ivanova-Bieg, Maria, Traylor, Wolfgang, Kate, Emily J., Derenne, Eve, Frank, Lea, Woodbridge, Jessie, Fyfe, Ralph, Shennan, Stephen, Kristiansen, Kristian, Thomas, Mark G. and Timpson, Adrian. "The BIAD Standards: Recommendations for Archaeological Data Publication and Insights From the Big Interdisciplinary Archaeological Database" Open Archaeology, vol. 10, no. 1, 2024, pp. 20240015. https://doi.org/10.1515/opar-2024-0015

2.3 Data Acquisition and Initial Recording

Instrument outputs–such as mass-spectrometry (IRMS, GC-MS, LC-MS/MS) and sequencing files, or microscopy visuals – are collected in vendor-specific raw formats (e.g., RAW, FASTQ), and preferably stored in instrument-related computer and copied into project (shared) folders, securing the back-up versions of initial measurement files. This raw data is then referenced and linked in combined worksheets (e.g. SharePoint or Google Sheets) that record initial metadata, sampling context, and lab-specific identifiers. These worksheets are used for early-stage review and validation.

2.4 Data Structuring and Collaborative Editing

Raw entries are transformed into structured research datasets by moving them to tabular data sheets (e.g. Excel), cleaning data, standardising terminology, and checking for consistency. This step includes:

Keeping consistent data structures by harmonising column names and formats to keep them consistent within the work process.

Selecting relevant data fields that will be filled/edited during the given datasets/stages

Validating entries against the requirement in the original documentation, e.g. field formats, required label, etc.

Assigning relational identifiers (e.g. site code, ledger number) and internal project/lab codes and versioning identifiers into corresponding fields of data sheets.

These structured datasets form the basis for computational analysis and are maintained within SharePoint for collaborative editing (e.g. “live” editing for Excel, but for other files, it might include different versions edited by different people). Access permissions are set to control changes and ensure data provenance.

2.5 Data Analysis

Once structured, datasets can be exported (typically as Comma-Separated Values, CSV files) and processed using computational tools tailored to specific research questions, analyses and data types. This includes data interpretation and evaluation, statistical modelling, pattern recognition, and visualisation. Analyses are typically performed in environments like R, Python, or specialised software, such as OxCal, IsoReader, mMass, or MaxQuant.

Analytical outputs must be reproducible and versioned, with all scripts and parameter settings documented and stored alongside the dataset, either in SharePoint or linked repositories (e.g., GitHub).

2.6 Data Validation and Feedback

Structured datasets are subjected to both planned and unplanned quality checks. Users can verify data completeness, coherence, and consistency with raw entries during a formal review process, but often various problems are noticed while working with data. In some cases, those require contextual knowledge, and it is thus not possible to catch all of those during any formal review. Feedback is communicated via SharePoint comments or tracked changes or ARHUT comments / tasks in case the dataset has already been entered to the ARHUT system. Datasets may cycle through multiple revisions before finalization. This feedback mechanism is essential for maintaining data quality and for correcting inconsistencies before deposition.

2.7 Curation and Archival in ARHUT

Finalized datasets are transferred to the ARHUT data platform, where they are archived with:

Persistent identifiers. Each entity has its own ARHUT link, that can be used to reference from publications but is essential in linking datasets. Additionally other identifiers can be added e.g. dataDOI.

Full metadata including sampling context, lab identifiers, and data structure

Relations to other data tables within the system, forming a gradually densifying knowledge graph.

These datasets become part of the long-term record and are linked to both internal systems (e.g., SharePoint, Archemy, Department of Archaeology) and external repositories (e.g Zenodo, Dryad) and databases (e.g BIAD).

2.8 Storage Platforms and File Formats

Add another diagram here about the structure of what is happening

Each phase of the workflow is supported by designated platforms:

SharePoint for live collaboration and version-controlled documentation

ARHUT for curated, long-term data with controlled access and open publishing options

Lab databases (e.g. BBAD) for supplementary metadata and internal tracking

2.9 Dissemination

Finalised and curated datasets archived in ARHUT are made available through their dissemination. ARHUT’s web interface (https://arh.ut.ee/) allows for structured querying and access to project-specific datasets, enriched with contextual metadata and persistent identifiers. PaleoMIX O.A.D. builds on the ARHUT infrastructure, offering public-facing access to selected datasets from PaleoMIX and related projects. This system enables transparent sharing of research outputs, supports interdisciplinary collaboration, and fosters broader reuse by both academic and public audiences.

References

Reiter, Samantha S., Staniuk, Robert, Kolář, Jan, Bulatović, Jelena, Rose, Helene Agerskov, Ryabogina, Natalia E., Speciale, Claudia, Schjerven, Nicoline, Paulsson, Bettina Schulz, Lee, Victor Yan Kin, Canteri, Elisabetta, Revill, Alice, Dahlberg, Fredrik, Sabatini, Serena, Frei, Karin M., Racimo, Fernando, Ivanova-Bieg, Maria, Traylor, Wolfgang, Kate, Emily J., Derenne, Eve, Frank, Lea, Woodbridge, Jessie, Fyfe, Ralph, Shennan, Stephen, Kristiansen, Kristian, Thomas, Mark G. and Timpson, Adrian. "The BIAD Standards: Recommendations for Archaeological Data Publication and Insights From the Big Interdisciplinary Archaeological Database" Open Archaeology, vol. 10, no. 1, 2024, pp. 20240015. https://doi.org/10.1515/opar-2024-0015