Forum list » General discussion »
Search engine and wildcards
Pages:  1
(36 posts)

Search engine and wildcards

Hi devs!

I am using the built-in search engine and I have two issues around wildcards:

  • If I do a search with an exclamation e.g. “Test!” and AutoWildcards = true I get an exception that “* or ?” cannot be at the beginning of a wildcard” straight from Lucene code. 
  • If I try to search for “*test*” with AutoWildcards = false I get the same exception - is that a bug or am I missing something about the search API? Can you explain why can't I do that search? E.g. I could search for ".net" and an article containing "asp.net" would be interesting for me of course. Right now I cannot find a way to match the .net in asp.net

 

Greetings,

Andrzej

(144 posts)

Re:Search engine and wildcards

Hi!

  • Lucene is pretty pickly about the search queries it supports. There are normally two ways of working with this: one is to filter all special characters, but that limits some of the advanced functionality in Lucene. The other is to not filter anything, but let the applications/websites make the decisions on how to handle it. For example you could have a try catch on the search call, and if it fails, do another search, with the same input, but escape the special characters first.
  • Lucene doesn't support wildcard as the first character of a searchterm unless the query parser is told specifically to support it. This is done since wildcard as the first character is VERY expensive. Lucene has to generate a lot of temporary datastructures for each query. However, we do plan to offer support for it in the future, but we will advise against it.
  •  
    (36 posts)

    Re:Search engine and wildcards

    Hi!

    Thanks for the info. One more thing came up for me around the same topic. I searched for some lucene docs and it seems that a dot "." is not a special character, so I thought I can safely search for terms containing dots. Unfortunately I am not able to get a hit when searching for a phrase that starts with a dot (e.g. ".net") in case using AutoWildcards = true or if I use a wildcard in the search phrase. I guess it might be connected with how the indexing mechanism treats dots (maybe removes them when building the index?). If I am right is there any way to control the indexer (stop words etc.) or is it some other case?

    Greetings,

    Andrzej

    (144 posts)

    Re:Search engine and wildcards

    Hi!

    It's true that you don' get any results with leading dots if you are using auto wildcards. Using wildcards means that the search terms are processed differently. This is a limitation in Lucene.

    As a solution, we'll change the AutoWildcard feature slightly. If it is enabled, leading dots will be stripped from the search terms, before the wildcard is added. 

    If you want more control over the wildcards, you can turn off AutoWildcards, and add the wildcard character to the words you want yourself.