This repository is initially created to integrate spaCy with cTakes for its UMLS dictionary lookup. Where cTakes is English only, SpaCy contains trained models (non medical) for several languages.
cTakes is build on UIMA which has its serialization formats (e.g. CAS XMI). NAF (KAFDocument) is chosen over the UIMA CAS XMI format for its native Python and Java implementations. Mapping between NAF and CAS is implemented in Java using cTakes UIMA's namespace.
Mapped Modules (partly done):
- Text
- Token
- POS
- Chunk (todo)
See test classes
- Apache cTakes
- Apache UIMA
- UMLS
- SpaCy
- NAF GitHub / PDF specs & Java / Python implementations