View Source

Paleoproteomics is an emerging field at the intersection of molecular biology, archaeology, and paleopathology that focuses on the recovery and analysis of ancient proteins from archaeological and paleontological remains. Unlike ancient DNA, which is often highly fragmented and prone to contamination, proteins can persist in a wider range of preservation conditions and offer valuable insights into taxonomy, physiology, diet, and disease.

In this context, the study of complex archaeological matrices such as dental calculus, foodcrusts, and sediments provides a unique opportunity to investigate past microbial communities and detect ancient pathogens. These substrates, though challenging due to their heterogeneous composition and environmental contamination, can preserve biomolecular traces that are key to understanding health, disease, and human-environment interactions in the past.

In the following workflow, we will present a methodological framework for the paleoproteomic analysis of such complex samples, with a particular emphasis on the detection and identification of pathogenic organisms.

Data Acquisition and Export
Data Processing with LC-MS

Raw Data Preparation

Collect raw data for each sample. The file extensions vary by vendor and may include .raw (Thermo Scientific), .d (Agilent), .wiff (AB SCIEX), or mzXML and mzData, among others

Prepare raw extraction and injection blanks for contamination control.

Database Preparation

The selection and setup of databases should align with the scientific objectives of the study. When the goal is to obtain information on both the host and potential pathogens, we will utilize the SwissProt database—known for its manually curated protein sequences and high-quality annotations—along with the expanded Human Oral Microbiome Database (eHOMD) and the CRAP database to identify potential contaminants. Additionally, UNIPROT, TrEMBL or in-house databases containing the proteome of interest can be used as needed.

Choice of proteomics analysis tools

Depending on the type of analysis you intend to carry out, you can opt for free software tools such as MaxQuant, Novor Cloud, Sage, pFind, or MSFragger. When analysing ancient proteins, the choice of software should consider its ability to reliably detect degraded or modified peptides, its flexibility in handling incomplete or custom databases, and its efficiency in processing large and complex mass spectrometry datasets.

Search parameters

The protein extraction and sample preparation method determines which chemical modifications should be considered during analysis. For example, if the SP3 protocol is used, carbamidomethylation of cysteine should be set as a fixed modification, as it is introduced during alkylation and will be present on all peptides.

The age of the sample influences the types of post-translational and chemical modifications (PTMs) expected. Generally, the older the sample, the greater the number and variety of potential modifications.

For example, in a sample approximately 5,000 years old, one might expect to observe:

Proline hydroxylation

Glutamine/asparagine deamidation

Methionine oxidation

Pyroglutamate formation (from glutamine and glutamic acid)

In contrast, for a sample several million years old, additional expected modifications may include:

N-terminal pyroglutamic acid formation (from glutamic and aspartic acids)

Phosphorylation of serine, threonine, and tyrosine

Conversion of arginine to ornithine

The choice of digestion enzyme is also influenced by the sample’s age and extraction protocol. For instance, trypsin is commonly used for its specificity, but semi-specific digestion may be employed to identify additional peptides. In cases where the sample is highly degraded or, depending on the research objectives, non-specific digestion might be necessary.

Settings can be configured with a precursor mass tolerance of 15 ppm and a fragment mass tolerance of 0.02 Da to ensure high mass accuracy during peptide identification, while applying a stringent false discovery rate (FDR) threshold of 1.0% at the peptide-spectrum match (PSM) level to maintain reliable identification confidence. However, this ultimately depends on the specific research objectives and the questions being addressed.

Filtering of Search Results

To minimise false positives as much as possible, it is recommended to include in the analysis only peptides with a score of ≤ 0.01 (software-dependent) and identified by at least 2 PSMs. Additionally, retain proteins that have at least 2 unique peptides.

Exclude from the analyzed samples all proteins detected in extraction blanks or those matching the cRAP database, except in some cases where proteins from dairy products may be retained.

Peptide Modification Analysis

The proportion of identified PTMs relative to total amino acids is calculated; different PTMS can be assessed according to the age of the sample, with a specific focus on glutamine/asparagine deamidation and peptide length distributions. For very ancient peptides, these elements may serve as indicators of authentic ancient peptides

BLAST & Taxonomic Assignment (for no-PTM bacterial peptides)

To assess peptide specificity, it is recommended to perform a BLAST search against the NCBI non-redundant (nr) database or use Unipept, retaining only peptides that exhibit 100% identity and 100% coverage exclusively with the organism of interest.

b and y ions coverage analysis

Analysing b- and y-ion coverage in MS/MS spectra is crucial for ancient peptides because it helps confirm the peptide sequence with high confidence, reducing the risk of misidentification. High coverage indicates that fragmentation occurred consistently along the peptide backbone, supporting the reliability of the identification. An ancient peptide is considered reliable when it shows strong b/y-ion coverage