A Foundation Model White Paper by neocore


1. Overview

The Neocore EEG Foundation Model (NCE-FM) is a large-scale, ~1B parameter, transformer-based model designed as a universal representation learner for EEG signals. Pre-trained on roughly 1000 hours of multi-task EEG recordings (equivalent to ~165 billion tokens), including both task-driven and resting-state data captured from various hardware setups and channel configurations. The model is hardware-, montage-, and channel-agnostic; even effective with a minimal 2-channel setup. A VQCAE block tokenizes the raw EEG into discrete tokens, which the hierarchical transformer then processes.


2. Intended Use and Scope

Consumer Application Challenges:
One of the primary obstacles to deploying BCI wearables in consumer products is the often insufficient predictive accuracy coupled with the extensive, and frequently exhausting, calibration required per user, per task domain, and even per session. This multi‑level calibration process can demand tens of minutes and numerous trial runs, significantly hindering seamless digital experiences. By leveraging a more abstract representation through NCE‑FM, we aim to eliminate these onerous calibration steps, enabling rapid adaptation on standard consumer‑grade hardware without specialized setup.

NCE-FM is engineered to provide robust EEG embeddings that can be rapidly adapted via few-shot fine-tuning for a wide range of downstream tasks:


3. Model Architecture

Key Components

  1. VQCAE Tokenizer:
    • Converts continuous EEG recordings into a sequence of discrete tokens.
    • Efficiently compresses high-dimensional EEG signals into an information-rich representation.
  2. Hierarchical Transformer Backbone:
    • Processes token sequences using multi-head self-attention in a hierarchical manner.
    • Integrates both local (within-window) and global (across-window) context.
    • Employs montage- and sample-rate normalization techniques, ensuring agnostic performance across diverse hardware setups and channel configurations.
  3. Adaptable Task Head:
    • A lightweight classifier head (e.g., linear or MLP) can be appended for task-specific fine-tuning.
    • Enables rapid adaptation using only a few labeled examples per task.

4. Training Data and Process


6. Transfer Learning Approach

Few-Shot Fine-Tuning Process (as demonstrated above):

  1. Freeze all NCE-FM parameters and append a linear classification head sized to the desired number of classes.
  2. For each class, randomly select exactly 2 samples, each consisting of a 0.25 s EEG window aggregated across lateral temporal and fronto-temporal channels for each hemisphere.
  3. Train only the linear head, while keeping all foundation weights frozen.

Advantages:

EEG Task Performance Comparison

EEG Task Performance Comparison

Performance metrics across various EEG classification tasks comparing state-of-the-art literature results with foundation model results


EEG TaskBest Model & Approach (Source)Dataset & ValidationPerformanceFoundation Model Performance (2 channels, subject-dependent)
2-class Motor Imagery (binary MI)Anchored‑STFT + adversarial augmented SkipNet CNN (Ali et al., 2022)BCI Comp. II Dataset III (Left/Right hand MI); subject-specific training/testing90.7% accuracy (mean across subjects)87.1% accuracy
Multi-class Motor Imagery (4-class MI)CSP+PSD feature ensemble with transfer learning (KMM + TrAdaBoost) (Wang et al., 2023)BCI Comp. IV Dataset 2a (4-class MI); subject-specific 10xCV with instance transfer91.5% accuracy (average 4-class classification)90.1% accuracy
Focus vs. Distraction (attention vs. inattention)LSTM recurrent network (Kaushik et al., 2022)Real-life debate EEG from 24 subjects; within-subject binary classification (focused vs distracted)95.86% accuracy (delta-band LSTM model)98.2% accuracy
Attention Level Classification (multi-level)EEG feature-based SVM with feature selection (Zhang et al., 2023)4-class attention states (high/medium/low/none) induced in lab (10 subjects); subject-dependent classification94.1% accuracy (4 attention levels)97.0% accuracy
Binary Emotion Classification (DEAP)TPRO-NET (Transformer + CNN hybrid) (X. Zhang et al., 2024)DEAP emotion EEG (32 subjects); subject-dependent 5-fold CV for high vs. low valence/arousal97.63% (valence) / 97.47% (arousal) accuracy96.7% (valence) / 95.0% (arousal) accuracy
Emotion Intensity Indexing – Classificationk-NN (k=1) regression + quadrant labeling (Alarcão et al., 2021)DEAP EEG (32 subjects); subject-independent continuous valence/arousal prediction converted to classes (high/low, 4-quadrant)89.8% (binary high/low) and 84.4% (4-class quadrant) accuracy89.0% (binary high/low) and 71.0% (4-class quadrant) accuracy
Sentence Reconstruction (Semantic Decoding)AGACNet – Adaptive Graph Attention CNN (Li et al., 2024)Custom single-subject EEG dataset (26 sessions) of silent reading 7 distinct sentences; multi-class identification62.26% accuracy for 7-way sentence classification (chance ~14%)29.4% accuracy

7. Limitations