PrefixTrie¶

A high-performance Cython implementation of a prefix trie data structure for efficient fuzzy string matching. Originally designed for RNA barcode matching in bioinformatics applications, but suitable for any use case requiring fast approximate string search.

Features¶

Ultra-fast exact matching using optimized Python sets
Fuzzy matching with configurable edit distance (insertions, deletions, substitutions)
Substring search to find trie entries within larger strings
Longest prefix matching for sequence analysis
Mutable and immutable trie variants
Multiprocessing support with pickle compatibility
Shared memory for high-performance parallel processing
Memory-efficient with collapsed node optimization
Bioinformatics-optimized for DNA/RNA/protein sequences

Quick Start¶

Our Getting Started Guide provides a step-by-step introduction to installing and using PrefixTrie.

Documentation¶

For detailed API documentation, see the API Reference.

Performance¶

PrefixTrie is highly optimized and typically outperforms similar fuzzy matching libraries:

Search Performance: Substantially faster than RapidFuzz, TheFuzz, and SymSpell
Substring Search: At least on par with fuzzysearch and regex, often faster
Memory Efficiency: Collapsed node optimization reduces memory footprint
Parallel Processing: Full pickle support for multiprocessing workflows

License¶

This project is licensed under the MIT License - see the LICENSE file for details.