Skip to main content

Launching Summer 2026

Synavistra is committed to full transparency about how our AI models are built, what data they are trained on, and how they process your documents.

Overview

Our AI document analysis tool runs entirely in your web browser. This page discloses the complete training data, model architecture, and processing methodology in compliance with the EU AI Act (Regulation EU 2024/1689).

Local Processing Architecture

When you use our document analysis tool, all processing happens locally on your device:

  • The AI model is downloaded once to your browser and cached locally
  • Your PDF documents are processed entirely in browser memory
  • Extracted text, entities, and knowledge graphs never leave your device
  • No analytics, tracking, cookies, or telemetry of any kind
  • Exported .snv.json files are saved directly to your local filesystem

Model Information

Base ModelPhi-3-mini-4k-instruct (Microsoft, 3.8B parameters)
Model LicenseMIT License (open source)
Fine-tuning MethodLoRA (Low-Rank Adaptation) on domain-specific legal texts
QuantizationINT4 (ONNX format for browser inference)
Inference EngineONNX Runtime Web with WebGPU acceleration

Evaluation Results

We publish our model evaluation results openly. These numbers reflect honest performance on held-out test data, not cherry-picked examples:

Task Precision Recall F1 Parse Rate
Named Entity Recognition (prefix 0) 69.4% 59.6% 62.3% 100%

Evaluation on 61 held-out examples from GDPR and CCPA texts. Additional prefix evaluations will be published as they complete. These results represent a 3.8B parameter model fine-tuned on 324 examples — not a frontier model.

Known Limitations

We believe honest disclosure of limitations is more valuable than marketing claims. This model:

  • Only supports English text (German and other languages not trained)
  • Only covers GDPR and CCPA/CPRA privacy law (no contract law, regulatory law, or other domains yet)
  • Does not include case law, court decisions, or regulatory guidance (only statutory text)
  • Is NOT a substitute for legal advice — outputs are AI-generated summaries that must be verified by qualified professionals
  • Has a 1024-token context window — very long articles may be truncated
  • NER F1 score of 62.3% means approximately 1 in 3 entities may be missed or incorrectly classified
  • Falls back to rule-based extraction when the AI model is not available (lower quality, but still functional)

Training Data Sources

The model was fine-tuned exclusively on publicly available official legal texts. Every source is documented with full provenance:

Source Documents License Jurisdiction
GDPR (Regulation EU 2016/679) 99 articles CC-BY-4.0 EU/EEA
CCPA/CPRA (Cal. Civ. Code 1798) 23 sections Public Domain (US state law) California

Total: 324 training examples across 5 task types (NER extraction, text cleanup, knowledge graph extraction, query decomposition, answer synthesis). All training examples were manually created from real legal text — no synthetic or AI-generated training data.

Training Methodology

  • Source texts are official legal documents downloaded from government websites (EUR-Lex, California Legislature)
  • Named entities were extracted using @nlpjs/ner with a curated legal entity dictionary
  • Knowledge graph relationships were manually identified and verified by domain experts
  • All training input/output pairs (golden records) are archived with SHA-256 checksums for reproducibility
  • Training was performed on Google Cloud TPU v6e infrastructure in the EU (europe-west4)

No Synthetic Training Data

We do not use AI-generated or synthetic training examples. Every training example was created by humans working with real legal text. This ensures the model learns from authoritative sources, not from AI hallucinations or circular training patterns.

Open Golden Records

Training data for our publicly accessible tools is fully open and downloadable. These are the human-verified input/output pairs used to train and evaluate the model. Anyone can inspect, reproduce, or challenge our training methodology. All records are archived with SHA-256 checksums and available at models.synavistra.ai/training-data/.

Pipeline Stage GDPR Pairs CCPA Pairs
Text Extraction4714
NER Extraction538
Knowledge Graph547
Query Decomposition529
Answer Synthesis4615
Total25253

Environmental Impact

We design for minimal environmental impact at every stage of the AI lifecycle:

Training

HardwareGoogle Cloud TPU v6e (single chip, ct6e-standard-1t)
Data Centereurope-west4 (Netherlands) — 82% carbon-free energy
Total Energy per Model~1 kWh — including all training, evaluation, scoring, ONNX export, and failed attempts. <a href="https://models.synavistra.ai/audits/phi3-legal-privacy-v1.json" rel="noopener">Detailed audit (JSON)</a>.

Fine-tuning is the technically correct approach for our data scale (324 examples from 122 legal documents), leveraging Phi-3's existing understanding from trillions of pre-training tokens. Fine-tuning also has the benefit of low environmental impact.

Inference

  • Browser-local inference: AI runs on the user's existing device — no cloud GPU servers required
  • INT4 quantization reduces compute per inference by ~4x compared to FP16, lowering energy use on every device
  • Zero idle energy: no servers running 24/7 waiting for requests — compute only happens when a user actively runs the tool
  • Model downloaded once and cached in the browser — subsequent uses require no network transfer

Licensing

All Synavistra-produced artifacts for publicly accessible tools are licensed under Apache 2.0. Third-party components retain their original licenses.

Artifact License Note
Fine-tuned model weightsApache-2.0Synavistra derivative work
Golden records (training data)Apache-2.0Human-created by Synavistra
Energy audits, manifestsApache-2.0Synavistra documentation
GDPR source textCC-BY-4.0EU official document, attribution required
CCPA source textPublic DomainUS state law, unrestricted
Phi-3-mini base modelMITMicrosoft, included per MIT terms
Phi-3 MIT License Notice (required attribution)
MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

EU AI Act Compliance

This disclosure is provided in accordance with Article 53 of the EU AI Act (Regulation EU 2024/1689) regarding transparency obligations for general-purpose AI models. Synavistra GmbH, Feldkirch, Austria, is the provider of this AI system. For questions about our AI practices, contact us at the address listed in our Impressum.

Independent Verification

We invite independent auditors, researchers, and regulators to verify any claim made on this page. All data is downloadable: training data registry, energy audit, and golden records for every pipeline stage. If you identify any inaccuracy or concern, please contact us.

Questions

If you have questions about our AI transparency practices, training data, or processing methodology, please contact us.

Frequently Asked Questions