loupe / matcher
Highlight and crop around search terms
Fund package maintenance!
Toflar
Requires
- php: ^8.1
- ext-intl: *
Requires (Dev)
- phpstan/phpstan: ^2.0
- phpunit/phpunit: ^10.5
- symfony/finder: ^6.2
- symplify/easy-coding-standard: ^12.5
This package is auto-updated.
Last update: 2025-05-12 13:24:31 UTC
README
Caution
Work in progress
A utility to identify and highlight search terms and create snippets around matched sections.
Lorem ipsum dolor sit amet, consetetur [...] no sea takimata sanctus est lorem est ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur [...] dolore te feugait nulla facilisi lorem ipsum dolor sit amet, consectetuer [...]
Installation
composer require loupe/matcher
Usage
Tokenizer
In order to work with string matching for both, highlighting and cropping, we need to tokenize a string into terms,
phrases etc. This library ships with a basic tokenizer built on top of the ext-intl
rules but you can implement
your own tokenizer by implementing the TokenizerInterface
. Its goal is to take a string and convert it into a
TokenCollection
. It is also responsible to decide, whether a given Token
instance matches any of the tokens in a
given TokenCollection
:
$tokenizer = new \Loupe\Matcher\Tokenizer\Tokenizer(); // optionally takes a locale to improve tokenization $tokenCollection = $tokenizer->tokenize('this is my string'); // Now you can use all sorts of helper functions $tokenCollection->all(); // all Token instances $tokenCollection->allNegated(); // all negated terms (- prefixed) $tokenCollection->phraseGroups(); // tokens within phrase groups (inside quotation marks, e.g. "this is a phrase") // etc.
Matcher
The Matcher
helper is here to help you find matches between two TokenCollection
s (or strings for simplicity):
$matcher = new \Loupe\Matcher\Matcher(); $matchingTokenCollection = $matcher->calculateMatches('This is my original text which I want to query.', 'query'); // $matchingTokenCollection will now contain all Token instances that match the query. // Sometimes you might be interested in the spans of the matches (the start and end positions of the tokens matched): $spans = $matcher->calculateMatchSpans($matchingTokenCollection); foreach ($spans as $span) { echo 'This span started at:' . $span->getStartPosition(); echo 'This span ended at:' . $span->getEndPosition(); echo 'This span has a length of:' . $span->getLength(); }
Formatter
The Formatter
takes a FormatterOptions
instance and formats directly on two strings (text and query) according to your
configuration. You can also pass a TokenCollection
for the $query
directly if you want and have tokenized those
before. The $text
, however, has to be a string.
$tokenizer = new \Loupe\Matcher\Tokenizer\Tokenizer(); $matcher = new Loupe\Matcher\Matcher($tokenizer); $formatter = new \Loupe\Matcher\Formatter($matcher); $options = (new \Loupe\Matcher\FormatterOptions()) ->withEnableHighlight() // enable highlighting ->withHighlightStartTag('<b>') // default: <em> ->withHighlightStartTag('</b>') // default: </em> ->withEnableCrop() // enable cropping ->withCropLength(40) // default: 50 ->withCropMarker('.......') // default: … ; $result = $formatter->format('This is my original text which I want to query.', 'query'); echo 'This is the formatted result: ' . $result->getFormattedText();
Cropping pre-highlighted results
Sometimes, you have a pre-highlighted text that needs cropping (e.g. because your search engine supports highlighting
but not context cropping), you can use the Cropper
formatter directly in this case:
$cropper = new \Loupe\Matcher\Formatting\Cropper( $cropLength = 10, $cropMarker = '…', $highlightStartTag = '<em>', $highlightEndTag = '</em>', ); echo $cropper->cropHighlightedText('This is a <em>test</em> string.'); // Outputs: …a <em>test</em> st…