loupe/matcher

Highlight and crop around search terms

Fund package maintenance!
Toflar

0.1.0 2025-05-08 12:59 UTC

This package is auto-updated.

Last update: 2025-05-12 13:24:31 UTC


README

Caution

Work in progress

A utility to identify and highlight search terms and create snippets around matched sections.

Lorem ipsum dolor sit amet, consetetur [...] no sea takimata sanctus est lorem est ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur [...] dolore te feugait nulla facilisi lorem ipsum dolor sit amet, consectetuer [...]

Installation

composer require loupe/matcher

Usage

Tokenizer

In order to work with string matching for both, highlighting and cropping, we need to tokenize a string into terms, phrases etc. This library ships with a basic tokenizer built on top of the ext-intl rules but you can implement your own tokenizer by implementing the TokenizerInterface. Its goal is to take a string and convert it into a TokenCollection. It is also responsible to decide, whether a given Token instance matches any of the tokens in a given TokenCollection:

$tokenizer = new \Loupe\Matcher\Tokenizer\Tokenizer(); // optionally takes a locale to improve tokenization
$tokenCollection = $tokenizer->tokenize('this is my string');

// Now you can use all sorts of helper functions
$tokenCollection->all(); // all Token instances
$tokenCollection->allNegated(); // all negated terms (- prefixed)
$tokenCollection->phraseGroups(); // tokens within phrase groups (inside quotation marks, e.g. "this is a phrase")
// etc.

Matcher

The Matcher helper is here to help you find matches between two TokenCollections (or strings for simplicity):

$matcher = new \Loupe\Matcher\Matcher();
$matchingTokenCollection = $matcher->calculateMatches('This is my original text which I want to query.', 'query');

// $matchingTokenCollection will now contain all Token instances that match the query.

// Sometimes you might be interested in the spans of the matches (the start and end positions of the tokens matched):
$spans = $matcher->calculateMatchSpans($matchingTokenCollection);
foreach ($spans as $span) {
    echo 'This span started at:' . $span->getStartPosition();
    echo 'This span ended at:' . $span->getEndPosition();
    echo 'This span has a length of:' . $span->getLength();
}

Formatter

The Formatter takes a FormatterOptions instance and formats directly on two strings (text and query) according to your configuration. You can also pass a TokenCollection for the $query directly if you want and have tokenized those before. The $text, however, has to be a string.

$tokenizer = new \Loupe\Matcher\Tokenizer\Tokenizer();
$matcher = new Loupe\Matcher\Matcher($tokenizer);

$formatter = new \Loupe\Matcher\Formatter($matcher);

$options = (new \Loupe\Matcher\FormatterOptions())
    ->withEnableHighlight() // enable highlighting
    ->withHighlightStartTag('<b>') // default: <em>
    ->withHighlightStartTag('</b>') // default: </em>
    ->withEnableCrop() // enable cropping
    ->withCropLength(40) // default: 50
    ->withCropMarker('.......') // default: … 
;

$result = $formatter->format('This is my original text which I want to query.', 'query');

echo 'This is the formatted result: ' . $result->getFormattedText();

Cropping pre-highlighted results

Sometimes, you have a pre-highlighted text that needs cropping (e.g. because your search engine supports highlighting but not context cropping), you can use the Cropper formatter directly in this case:

$cropper = new \Loupe\Matcher\Formatting\Cropper(
    $cropLength = 10,
    $cropMarker = '',
    $highlightStartTag = '<em>',
    $highlightEndTag = '</em>',
);

echo $cropper->cropHighlightedText('This is a <em>test</em> string.'); // Outputs: …a <em>test</em> st…