Powering Intelligent Search at Bizom: From Fuzzy Matches to Semantic Understanding

By Samarth Patel, Senior Software Engineer

25th July, 2025

Search is a fundamental part of the Bizom platform — whether it’s a salesman searching for products during a field visit, or internal teams managing catalogs. But building an effective search system is far more complex than just matching keywords.

At Bizom, we operate in the real world of FMCG and retail distribution, where users interact with the system in diverse, sometimes unpredictable ways. For instance, a retailer might type “wafar” instead of “Wafers,” or search for “namkeen” expecting a list of snacks. Some users mix Hindi and English (e.g., “paani” for water), while others refer to products by nicknames or local aliases. A rigid, exact-match keyword search simply doesn’t cut it in such environments.Search is a fundamental part of the Bizom platform — whether it’s a salesman searching for products during a field visit, or internal teams managing catalogs. But building an effective search system is far more complex than just matching keywords.

Let’s Understand the Approach

  • Fault-tolerant to spelling mistakes and typos
  • Semantically aware, so that “soft drink” and “Pepsi” are treated as related
  • Multilingual-friendly, especially for Hinglish, the Hindi-English blend common in our user base

Improving search isn’t just a technical challenge — it’s a business imperative. Poor search can mean lost orders, longer call times for support teams, and frustration for field users trying to complete a simple task. Our goal was to make search intuitive, forgiving, and intelligent — regardless of how a user chooses to express their need.

To solve the search challenges we faced, we adopted a layered approach — starting with making search fault-tolerant using fuzzy logic, and gradually moving toward understanding the intent behind search queries using semantic techniques.Improving search isn’t just a technical challenge — it’s a business imperative. Poor search can mean lost orders, longer call times for support teams, and frustration for field users trying to complete a simple task. Our goal was to make search intuitive, forgiving, and intelligent — regardless of how a user chooses to express their need.

Let’s Understand the Approach

Fuzzy Search: Handling Typos and Spelling Variants

Fuzzy search makes search more forgiving by correcting small errors in the user’s query. It uses techniques like Levenshtein Distance (edit distance) to match terms that are close enough to the actual data. 

For example distance between: mesala wafar (user’s input) and masala wafer (actual item name) is 2. That is by editing two characters the target name can be achieved.

Semantic Search: Understanding Intent and Meaning

While fuzzy search fixes spelling issues, it can’t help when a user searches with a related concept instead of the exact word. That’s where semantic search becomes essentialFuzzy search makes search more forgiving by correcting small errors in the user’s query. It uses techniques like Levenshtein Distance (edit distance) to match terms that are close enough to the actual data. 

For example distance between: mesala wafar (user’s input) and masala wafer (actual item name) is 2. That is by editing two characters the target name can be achieved.

Even if the word “namkeen” isn’t in the product name, we want to return items like: Masala Wafers, Bhujia etc. 

To do this, we use embedding models to convert both search queries and product names into high-dimensional vectors that represent meaning. Then, using vector similarity search, we find the most relevant matches — even across synonyms, aliases, or category names.

Vector representation of both the words

This allows the system to “understand” that “namkeen” is related to “bhujia” and “Wafers” even when the keywords don’t match directly.

Handling Hinglish Search

We noticed users often typed in Hinglish — like “paani” for water or “namkeen” for snacks. While techniques like transliteration or multilingual embeddings were considered, we avoided overcomplicating things.

These methods are costly to implement and maintain, and given our finite, well-defined catalog, they were unnecessary. Instead, we adopted a simple and effective synonym-based approach — manually mapping common Hinglish terms to product keywords.

Vector representation of both the words

This allows the system to “understand” that “namkeen” is related to “bhujia” and “Wafers” even when the keywords don’t match directly.

Examples:

  • “paani” → “water”
  • “namkeen” → “snacks”, “bhujia”

This lightweight solution worked well for our use case without adding model complexity.

Implementation Details: How We Built the Search System

Once we had finalized our approach — combining fuzzy search, semantic understanding, and Hinglish synonym support — we focused on building an architecture that was both scalable and maintainable.

OpenSearch as Our Search Engine and Vector Database

OpenSearch served as the core of our search infrastructure, powering both traditional keyword-based search and semantic vector-based retrieval.

To streamline indexing, we used OpenSearch ingestion pipelines to apply a series of text transformations before data was stored. These transformations included steps like: Lowercasing and normalization, Synonym expansion (including Hinglish mappings), Removing special characters or stop words.

For semantic search, we went a step further by integrating embedding models directly into the pipeline, allowing us to vectorize product names and descriptions at index time. This eliminated the need for separate preprocessing jobs and ensured our vector data stayed in sync with the source.

This built-in pipeline support made our search system more maintainable and scalable without introducing external ETL complexity.

Logstash for Data Ingestion and Syncing

To sync our data from MySQL into OpenSearch, we used Logstash as the pipeline tool. Our setup ensures:

  • Periodic pulls from the source MySQL database
  • Indexing into the appropriate OpenSearch index

This near real-time syncing ensures that any updates to product data are quickly reflected in the search results without manual intervention or delays.

Index Design

One of the key decisions we made was to maintain separate indexes for fuzzy and semantic search.

Not all search use cases require semantic understanding. For example, when a user types a product code or a highly specific name, fuzzy search is more than sufficient. On the other hand, broader or more generic queries like “namkeen” or “cold drink” benefit from semantic context.

By creating separate indexes, we were able to:

  • Avoid unnecessary vectorization of all records, which can be computationally expensive
  • Keep our semantic index small, fast, and focused
  • Tune relevance and scoring separately for fuzzy and semantic queries

This also allowed us to apply different refresh strategies, analyzers, and embeddings depending on the nature of the index — giving us much more control without overengineering the system.

Stitched-Up System: Architecture Overview

This architecture gave us flexibility to optimize for different query types, while keeping the system simple, efficient, and easy to extend.

Here’s how the flow works:

  1. Data Pipeline: Logstash extracts product data from the master database and feeds it into OpenSearch.
  2. Indexing & Embedding: OpenSearch applies ingestion pipelines to transform and vectorize text fields using pre-integrated embedding models.
  3. Smart Search Layer: This component acts as a decision-maker. Based on the incoming request, it determines the type of search to perform — fuzzy, semantic, or synonym-based — and builds the appropriate query for OpenSearch. 

Client Interaction: End-users trigger search from the client app, which gets routed through the smart search layer to fetch the most relevant results.

Conclusion

We built a smart, efficient search system by combining fuzzy search, synonym-based Hinglish support, and selective semantic search. Given our well-defined catalog, we avoided overengineering and focused on practical solutions.

With OpenSearch for search and vector storage, Logstash for syncing with MySQL, and custom pipelines for text transformation and embeddings, our setup is both cost-effective and scalable — ready to support future needs