
Freelance (from 03/2023) | Led design, development, model selection, fine-tuning, and prompt engineering for a RAG pipeline solution that answers questions from a large corpus of legal documents, now live and used by several hundred help-desk employees | Insurance |
Developed tooling to manage data and model versions, enabling efficient porting of the above RAG solution to additional corpora, languages, and business use cases | Insurance | |
Designed and implemented a solution to extract text from PDF OCR layers with the correct semantic ordering and structure for further analysis by LLMs | Insurance | |
Explored various explainable AI methods within the RAG context | Insurance | |
Explosion AI, Berlin (11/2021 — 02/2023) | Core developer maintaining the spaCy and thinc libraries (Python / Cython) | AI |
Investigated and implemented strategies to improve the accuracy of lemmatisation models across 17 languages while maintaining acceptable speed | AI | |
msg systems ag, Munich (02/2014 — 10/2021) | Designed and developed algorithms and systems to analyse the structures of legal texts and to detect anomalous wording in contract proposals | Insurance |
Designed a cloud service to recognise incoming E-mails containing orders and to extract structured information from them | Logistics | |
Designed and developed an application with a Microsoft Word/PowerPoint plugin to recognise confidential text passages in internal documents and to remove them prior to external publication | Automotive |
Library | Role | Description |
---|---|---|
Holmes | Sole author | Information extraction from English and German texts based on predicate logic; supports intelligent search. |
Coreferee | Sole author | Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages. |
spaCy | Former core maintainer | Industrial-strength Natural Language Processing in Python. |
thinc | Former core maintainer | The equivalent to PyTorch in the spaCy stack. |

NLP / Machine Learning | Deep learning; Explainable AI; GPU processing; Haystack; Hugging Face; LangChain; Large Language Models (LLM); LlamaIndex; Machine learning; Natural language processing (NLP); Neural networks; OpenSearch; Prompt engineering; PyTorch; Retrieval-Augmented Generation (RAG); Spacy; Transformers |
Programming languages (ordered by experience) | Python; Java; SQL; JEE; Bash; Cython; Javascript; C++ |
Cloud providers | Azure; AWS; GCP |
General technologies and frameworks | Ansible; Cassandra; Docker; Hadoop; Kafka; Kubernetes; MongoDB; MySQL; PostgreSQL; RabbitMQ |
Operating systems | Unix; Linux; Ubuntu; macOS |
Concepts | Application architecture; Big Data; Data lake; DevOps; Distributed systems; EAI; ETL; Integration; Messaging; Stakeholder management |
Security | BSI Grundschutz; Business continuity; Certificates; Compliance; Cryptography; Data protection; DevSecOps; Disaster recovery; Encryption; GDPR/DSGVO; IAM; ISO standards; ISMS; Key management; Legal requirements; LLM security; Network security; NIST; PKI; Risk management; Security engineering; Security policies; Threat analysis |
Introducing Holmes 4.0, Explosion AI, 2022 |
Quanten-Computing: Zukunftstechnologie mit stark eingeschränktem Einsatzfeld, iX Developer, 2020 |
Wortgewandt: Natürliche Sprache zielgenau verarbeiten mit semantischer Textanalyse, iX Developer, 2020 |
KI-gestützte Textanalyse beim Releasemanagement, Softwareforen Leipzig, 2019 |
Censor Robots: Using AI to Redact Confidential Information, (ISC)2 Secure Summit, London, 2019 |
Cybertwists: Hacking and Cyberattacks Explained, CreateSpace, 2018 |
Machine Learning Catalogue, msg, 2017 |