Transparency

Overview

Our AI document analysis tool runs entirely in your web browser. This page discloses the complete training data, model architecture, and processing methodology in compliance with the EU AI Act (Regulation EU 2024/1689).

Local Processing Architecture

When you use our document analysis tool, all processing happens locally on your device:

The AI model is downloaded once to your browser and cached locally
Your PDF documents are processed entirely in browser memory
Extracted text, entities, and knowledge graphs never leave your device
No analytics, tracking, cookies, or telemetry of any kind
Exported .snv.json files are saved directly to your local filesystem

Model Information

Base Model	Phi-3-mini-4k-instruct (Microsoft, 3.8B parameters)
Model License	MIT License (open source)
Fine-tuning Method	LoRA (Low-Rank Adaptation) on domain-specific legal texts
Quantization	INT4 (ONNX format for browser inference)
Inference Engine	ONNX Runtime Web with WebGPU acceleration

Evaluation Results

We publish our model evaluation results openly. These numbers reflect honest performance on held-out test data, not cherry-picked examples:

Task	Precision	Recall	F1	Parse Rate
Named Entity Recognition (prefix 0)	69.4%	59.6%	62.3%	100%

Evaluation on 61 held-out examples from GDPR and CCPA texts. Additional prefix evaluations will be published as they complete. These results represent a 3.8B parameter model fine-tuned on 324 examples — not a frontier model.

Known Limitations

We believe honest disclosure of limitations is more valuable than marketing claims. This model:

Only supports English text (German and other languages not trained)
Only covers GDPR and CCPA/CPRA privacy law (no contract law, regulatory law, or other domains yet)
Does not include case law, court decisions, or regulatory guidance (only statutory text)
Is NOT a substitute for legal advice — outputs are AI-generated summaries that must be verified by qualified professionals
Has a 1024-token context window — very long articles may be truncated
NER F1 score of 62.3% means approximately 1 in 3 entities may be missed or incorrectly classified
Falls back to rule-based extraction when the AI model is not available (lower quality, but still functional)

Training Data Sources

The model was fine-tuned exclusively on publicly available official legal texts. Every source is documented with full provenance:

Source	Documents	License	Jurisdiction
GDPR (Regulation EU 2016/679)	99 articles	CC-BY-4.0	EU/EEA
CCPA/CPRA (Cal. Civ. Code 1798)	23 sections	Public Domain (US state law)	California

Total: 324 training examples across 5 task types (NER extraction, text cleanup, knowledge graph extraction, query decomposition, answer synthesis). All training examples were manually created from real legal text — no synthetic or AI-generated training data.

Training Methodology

Source texts are official legal documents downloaded from government websites (EUR-Lex, California Legislature)
Named entities were extracted using @nlpjs/ner with a curated legal entity dictionary
Knowledge graph relationships were manually identified and verified by domain experts
All training input/output pairs (golden records) are archived with SHA-256 checksums for reproducibility
Training was performed on Google Cloud TPU v6e infrastructure in the EU (europe-west4)

No Synthetic Training Data

We do not use AI-generated or synthetic training examples. Every training example was created by humans working with real legal text. This ensures the model learns from authoritative sources, not from AI hallucinations or circular training patterns.

Open Golden Records

Training data for our publicly accessible tools is fully open and downloadable. These are the human-verified input/output pairs used to train and evaluate the model. Anyone can inspect, reproduce, or challenge our training methodology. All records are archived with SHA-256 checksums and available at models.synavistra.ai/training-data/.

Pipeline Stage	GDPR Pairs	CCPA Pairs
Text Extraction	47	14
NER Extraction	53	8
Knowledge Graph	54	7
Query Decomposition	52	9
Answer Synthesis	46	15
Total	252	53

Environmental Impact

We design for minimal environmental impact at every stage of the AI lifecycle:

Training

Hardware	Google Cloud TPU v6e (single chip, ct6e-standard-1t)
Data Center	europe-west4 (Netherlands) — 82% carbon-free energy
Total Energy per Model	~1 kWh — including all training, evaluation, scoring, ONNX export, and failed attempts. <a href="https://models.synavistra.ai/audits/phi3-legal-privacy-v1.json" rel="noopener">Detailed audit (JSON)</a>.

Fine-tuning is the technically correct approach for our data scale (324 examples from 122 legal documents), leveraging Phi-3's existing understanding from trillions of pre-training tokens. Fine-tuning also has the benefit of low environmental impact.

Inference

Browser-local inference: AI runs on the user's existing device — no cloud GPU servers required
INT4 quantization reduces compute per inference by ~4x compared to FP16, lowering energy use on every device
Zero idle energy: no servers running 24/7 waiting for requests — compute only happens when a user actively runs the tool
Model downloaded once and cached in the browser — subsequent uses require no network transfer

Licensing

All Synavistra-produced artifacts for publicly accessible tools are licensed under Apache 2.0. Third-party components retain their original licenses.

Artifact	License	Note
Fine-tuned model weights	Apache-2.0	Synavistra derivative work
Golden records (training data)	Apache-2.0	Human-created by Synavistra
Energy audits, manifests	Apache-2.0	Synavistra documentation
GDPR source text	CC-BY-4.0	EU official document, attribution required
CCPA source text	Public Domain	US state law, unrestricted
Phi-3-mini base model	MIT	Microsoft, included per MIT terms

Phi-3 MIT License Notice (required attribution)

MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

EU AI Act Compliance

This disclosure is provided in accordance with Article 53 of the EU AI Act (Regulation EU 2024/1689) regarding transparency obligations for general-purpose AI models. Synavistra GmbH, Feldkirch, Austria, is the provider of this AI system. For questions about our AI practices, contact us at the address listed in our Impressum.

Independent Verification

We invite independent auditors, researchers, and regulators to verify any claim made on this page. All data is downloadable: training data registry, energy audit, and golden records for every pipeline stage. If you identify any inaccuracy or concern, please contact us.

Questions

If you have questions about our AI transparency practices, training data, or processing methodology, please contact us.