Finding answers

January 5, 2008

In last weeks post I asked:

Is if it’s even in-theory possible to have a general-purpose search engine that you can just drop in to a knowledge-rich environment and, hey-presto, get the answers you’re looking for?

After thinking about this for a week, and reading Tim Berners-Lee’s 2001 article from the SciAm, I’m inclined to think that: no, it’s not possible. And it’s pretty obvious that it’s not possible. As Tim writes, “To date, the Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically.” This goes double for the pile of information I need to digest at work. The trouble is that I have a bunch of unstructured data, and to work out what it all means and how it all relates, I need to do a lot of work. Of course it’s not possible for a search engine to do this currently!

They don’t even come close. Tim has a few examples of questions that, given a sufficiently structured data source, are very easily computable. Here’s my example: what bank in Australia has the lowest fee structure, with a preference for low overseas withdrawal fees. Plug that in to Google, and you’ll not get an answer. Oh, you might get linked to a half a dozen comparison sites, but they never seem to quite be interested in what you’re interested in. You might get linked to a Choice Magazine article that seems to have the answer: if you’re willing to pay $15 to access the article for 3 days. But you won’t get tabulated results saying: this are the bank offerings that most closely match your needs.

The semantic web could change that. Wouldn’t it be lovely. And yet it hasn’t happened. Are the technologies out there? Well, that’s my next question. Will it be as obvious as my last?

Advertisement

2 Responses to “Finding answers”

  1. Jack Says:

    I think it’s one of those problems that have been shelved into the “too hard” basket. Regardless of how semantic everything is, basically it comes down to machines understanding human language (and not necessarily just English).

    On any given web page, everything is important. But not everything is equally important and not everything is important to everyone. If you can somehow capture everything and give it to the right people then you’ve probably earned yourself a Nobel Prize.

    The semantic web is a great tool but currently it seems to serve more as an accessibility thing. Those who may not have all of their vision or all of their hearing or are weird and prefer to surf via a Terminal can still understand the ideas communicated without the full experience offered. Sure, search engines pick up on this data as well but it’s all still just glorified keyword and link tracking.

    Personally, I still believe in the hand crafted approach. Not to say that everything should be built from scratch but the most important part should be. If you want a search engine for bank comparison or medical diagnosis or whatever then the data entered needs to be given in a recognized format and the search engine should be aware of which data qualifies as search criteria. As far as I can see, we’ll always need to determine our search fields and indexes like libraries have done since they were invented.

  2. karan Says:

    The issue with the semantic web for your example, for instance, is that whoever or whatever is attaching the semantics to the content has to consider what you consider significant; as you’ve already demonstrated, not many others consider your particular combination of features significant.

    This really is more of an AI problem: you’re essentially asking for a flexible data-mining algorithm that can connect query parameters to data points intelligently – slicing the data one way or the other, depending on what slices are given prominence.

    Think about how you’d approach doing the above question. My approach would be to, assuming I had all the fee structures in a comparable form, sort them by the fee structure, assigning a point score to each (perhaps arbitrary or linked to the fees). This would then have a second pass where the lower international transaction fees are given “bonus points” or preference. The final ranking would give you a fairly good list.

    You’d then probably connect that to other aspects, such as interest on savings. Again, each query would require a further ‘pass’ through the data to assign bonus points to those ‘facts’ that have preferred characteristics.

    In terms of making it a generic, repeatable algorithm, it can be done, but it is very computationally expensive. My view would be to have a meta-search engine which takes Google’s search results and filters or reorders based on preferred criteria; it certainly is an interesting challenge to lay out.

    Humans can do these kind of calculations and valuations relatively rapidly because of our cognitive abilities being geared towards imprecise inputs – the thoughts of others expressed through the complexity of language.

    What computers have done for us so far is to speed the geometric calculations, but what we are essentially now asking them is to speed the cognition, something far more imprecise and something that, as far as I know, we barely understand ourselves. You are asking the right questions, and poking in the right field, though :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.