PrefixTrie¶
A high-performance Cython implementation of a prefix trie data structure for efficient fuzzy string matching. Originally designed for RNA barcode matching in bioinformatics applications, but suitable for any use case requiring fast approximate string search.
Features¶
- Ultra-fast exact matching using optimized Python sets
- Fuzzy matching with configurable edit distance (insertions, deletions, substitutions)
- Substring search to find trie entries within larger strings
- Longest prefix matching for sequence analysis
- Mutable and immutable trie variants
- Multiprocessing support with pickle compatibility
- Shared memory for high-performance parallel processing
- Memory-efficient with collapsed node optimization
- Bioinformatics-optimized for DNA/RNA/protein sequences
Quick Start¶
Our Getting Started Guide provides a step-by-step introduction to installing and using PrefixTrie.
Documentation¶
For detailed API documentation, see the API Reference.
Performance¶
PrefixTrie is highly optimized and typically outperforms similar fuzzy matching libraries:
- Search Performance: Substantially faster than RapidFuzz, TheFuzz, and SymSpell
- Substring Search: At least on par with fuzzysearch and regex, often faster
- Memory Efficiency: Collapsed node optimization reduces memory footprint
- Parallel Processing: Full pickle support for multiprocessing workflows
License¶
This project is licensed under the MIT License - see the LICENSE file for details.