A Foundation Model White Paper by neocore
1. Overview
The Neocore EEG Foundation Model (NCE-FM) is a large-scale, ~1B parameter, transformer-based model designed as a universal representation learner for EEG signals. Pre-trained on roughly 1000 hours of multi-task EEG recordings (equivalent to ~165 billion tokens), including both task-driven and resting-state data captured from various hardware setups and channel configurations. The model is hardware-, montage-, and channel-agnostic; even effective with a minimal 2-channel setup. A VQCAE block tokenizes the raw EEG into discrete tokens, which the hierarchical transformer then processes.
2. Intended Use and Scope
Consumer Application Challenges:
One of the primary obstacles to deploying BCI wearables in consumer products is the often insufficient predictive accuracy coupled with the extensive, and frequently exhausting, calibration required per user, per task domain, and even per session. This multi‑level calibration process can demand tens of minutes and numerous trial runs, significantly hindering seamless digital experiences. By leveraging a more abstract representation through NCE‑FM, we aim to eliminate these onerous calibration steps, enabling rapid adaptation on standard consumer‑grade hardware without specialized setup.
NCE-FM is engineered to provide robust EEG embeddings that can be rapidly adapted via few-shot fine-tuning for a wide range of downstream tasks:
- Motor Imagery Classification: Commonly used in BCIs.
- Attention-Related Classification: Detecting focused versus distracted states.
- Emotion Recognition: Discriminating affective states such as positive versus negative valence.
- Semantic EEG Tasks: Handling more complex cognitive processes (e.g., semantic decoding during language tasks).
3. Model Architecture
Key Components
- VQCAE Tokenizer:
- Converts continuous EEG recordings into a sequence of discrete tokens.
- Efficiently compresses high-dimensional EEG signals into an information-rich representation.
- Hierarchical Transformer Backbone:
- Processes token sequences using multi-head self-attention in a hierarchical manner.
- Integrates both local (within-window) and global (across-window) context.
- Employs montage- and sample-rate normalization techniques, ensuring agnostic performance across diverse hardware setups and channel configurations.
- Adaptable Task Head:
- A lightweight classifier head (e.g., linear or MLP) can be appended for task-specific fine-tuning.
- Enables rapid adaptation using only a few labeled examples per task.
4. Training Data and Process
- Pre-training Data:
Approximately 1000 hours of EEG recordings were used for pretraining. Data were collected from multiple laboratories, across various hardware (ranging from 2 to 64 channels), and included diverse experimental paradigms (motor imagery, attention tasks, emotion recognition, and semantic tasks). - Training Objectives:
The training followed a multi-task, self-supervised approach followed by supervised fine-tuning for select tasks. Sensitive implementation details have been redacted to protect Neocore’s proprietary methods.
6. Transfer Learning Approach
Few-Shot Fine-Tuning Process (as demonstrated above):
- Freeze all NCE-FM parameters and append a linear classification head sized to the desired number of classes.
- For each class, randomly select exactly 2 samples, each consisting of a 0.25 s EEG window aggregated across lateral temporal and fronto-temporal channels for each hemisphere.
- Train only the linear head, while keeping all foundation weights frozen.
Advantages:
- Reduced Calibration Time: Rapid adaptation using minimal subject-specific data, remaining agnostic to task domain.
- Robustness: Leverages broad EEG representations learned from diverse data.
- Flexibility: Remains effective across a wide range of EEG channel configurations and hardware types.
EEG Task Performance Comparison
Performance metrics across various EEG classification tasks comparing state-of-the-art literature results with foundation model results
EEG Task | Best Model & Approach (Source) | Dataset & Validation | Performance | Foundation Model Performance (2 channels, subject-dependent) |
---|---|---|---|---|
2-class Motor Imagery (binary MI) | Anchored‑STFT + adversarial augmented SkipNet CNN (Ali et al., 2022) | BCI Comp. II Dataset III (Left/Right hand MI); subject-specific training/testing | 90.7% accuracy (mean across subjects) | 87.1% accuracy |
Multi-class Motor Imagery (4-class MI) | CSP+PSD feature ensemble with transfer learning (KMM + TrAdaBoost) (Wang et al., 2023) | BCI Comp. IV Dataset 2a (4-class MI); subject-specific 10xCV with instance transfer | 91.5% accuracy (average 4-class classification) | 90.1% accuracy |
Focus vs. Distraction (attention vs. inattention) | LSTM recurrent network (Kaushik et al., 2022) | Real-life debate EEG from 24 subjects; within-subject binary classification (focused vs distracted) | 95.86% accuracy (delta-band LSTM model) | 98.2% accuracy |
Attention Level Classification (multi-level) | EEG feature-based SVM with feature selection (Zhang et al., 2023) | 4-class attention states (high/medium/low/none) induced in lab (10 subjects); subject-dependent classification | 94.1% accuracy (4 attention levels) | 97.0% accuracy |
Binary Emotion Classification (DEAP) | TPRO-NET (Transformer + CNN hybrid) (X. Zhang et al., 2024) | DEAP emotion EEG (32 subjects); subject-dependent 5-fold CV for high vs. low valence/arousal | 97.63% (valence) / 97.47% (arousal) accuracy | 96.7% (valence) / 95.0% (arousal) accuracy |
Emotion Intensity Indexing – Classification | k-NN (k=1) regression + quadrant labeling (Alarcão et al., 2021) | DEAP EEG (32 subjects); subject-independent continuous valence/arousal prediction converted to classes (high/low, 4-quadrant) | 89.8% (binary high/low) and 84.4% (4-class quadrant) accuracy | 89.0% (binary high/low) and 71.0% (4-class quadrant) accuracy |
Sentence Reconstruction (Semantic Decoding) | AGACNet – Adaptive Graph Attention CNN (Li et al., 2024) | Custom single-subject EEG dataset (26 sessions) of silent reading 7 distinct sentences; multi-class identification | 62.26% accuracy for 7-way sentence classification (chance ~14%) | 29.4% accuracy |
7. Limitations
- Interpretability: Deep transformer models can be opaque. Supplementary interpretability techniques are advised.
- Not Intended for Healthcare Applications: This model is not to be used for clinical decision-making, diagnostic evaluation, therapeutic intervention, patient monitoring, or any medical or assistive application.