Category Archives: Social Search Engines

Hakia takes on major search engines backed up by a small army of international investors

In our planned series of publications about the Semantic Web and its Apps today Hakia is our 3rd featured company.

Hakia.com, just like Freebase and Powerset is also heavily relying on Semantic technologies to produce and deliver hopefully better and meaningful results to its users.

Hakia is building the Web’s new “meaning-based” (semantic) search engine with the sole purpose of improving search relevancy and interactivity, pushing the current boundaries of Web search. The benefits to the end user are search efficiency, richness of information, and time savings. The basic promise is to bring search results by meaning match – similar to the human brain’s cognitive skills – rather than by the mere occurrence (or popularity) of search terms. Hakia’s new technology is a radical departure from the conventional indexing approach, because indexing has severe limitations to handle full-scale semantic search.

Hakia’s capabilities will appeal to all Web searchers – especially those engaged in research on knowledge intensive subjects, such as medicine, law, finance, science, and literature. The mission of hakia is the commitment to search for better search.

Here are the technological differences of hakia in comparison to conventional search engines.

QDEX Infrastructure

  • hakia’s designers broke from decades-old indexing method and built a more advanced system called QDEX (stands for Query Detection and Extraction) to enable semantic analysis of Web pages, and “meaning-based” search. 
  • QDEX analyzes each Web page much more intensely, dissecting it to its knowledge bits, then storing them as gateways to all possible queries one can ask.
  • The information density in the QDEX system is significantly higher than that of a typical index table, which is a basic requirement for undertaking full semantic analysis.
  • The QDEX data resides on a distributed network of fast servers using a mosaic-like data storage structure.
  • QDEX has superior scalability properties because data segments are independent of each other.

SemanticRank Algorithm

  • SemanticRank algorithm of hakia is comprised of innovative solutions from the disciplines of Ontological Semantics, Fuzzy Logic, Computational Linguistics, and Mathematics. 
  • Designed for the expressed purpose of higher relevancy.
  • Sets the stage for search based on meaning of content rather than the mere presence or popularity of keywords.
  • Deploys a layer of on-the-fly analysis with superb scalability properties.
  • Takes into account the credibility of sources among equally meaningful results.
  • Evolves its capacity of understanding text from BETA operation onward.

In our tests we’ve asked Hakia three English-language based questions:

Why did the stock market crash? [ http://www.hakia.com/search.aspx?q=why+did+the+stock+market+crash%3F ]
Where do I get good bagels in Brooklyn? [ http://www.hakia.com/search.aspx?q=where+can+i+find+good+bagels+in+brooklyn ]
Who invented the Internet? [ http://www.hakia.com/search.aspx?q=who+invented+the+internet ]

It basically returned intelligent results for all. For example, Hakia understood that, when we asked “why,” I would be interested in results with the words “reason for”–and produced some relevant ones. 

Hakia  is one of the few promising Alternative Search Engines as being closely watched by Charles Knight at his blog AltSearchEngines.com, with a focus on natural language processing methods to try and deliver ‘meaningful’ search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. Most other major search engines, including Google, analyze keywords. The company believes that the future of search engines will go beyond keyword analysis – search engines will talk back to you and in effect become your search assistant. One point worth noting here is that, currently, Hakia still has some human post-editing going on – so it isn’t 100% computer powered at this point and is close to human-powered search engine or combination of the two.

They hope to provide better search results with complex queries than Google currently offers, but they have a long way to catch up, considering Google’s vast lead in the search market, sophisticated technology, and rich coffers. Hakia’s semantic search technology aims to understand the meaning of search queries to improve the relevancy of the search results.

Instead of relying on indexing the web or on the popularity of particular web pages, as many search engines do, hakia tries to match the meaning of the search terms to mimic the cognitive processes of the human brain.

“We’re mainly focusing on the relevancy problem in the whole search experience,” said Dr. Berkan in an interview Friday. “You enter a question and get better relevancy and better results.”

Dr. Berkan contends that search engines that use indexing and popularity algorithms are not as reliable with combinations of four or more words since there are not enough statistics available on which to base the most relevant results.

“What we are doing is an ultimate approach, doing meaning-based searches so we understand the query and the text, and make an association between them by semantic analysis,” he said.

Analyzing whole sentences instead of keywords would indefinitely increase the cost to the company to index and process the world’s information. The case is pretty much the same with Powerset where they are also doing deep contextual analysis on every sentence on every web page and is publicly known fact they have higher cost for indexing and analyzing than Google. Taking into consideration that Google is having more than 450,000 servers in several major data centers and hakia’s indexing and storage costs might be even higher the approach they are taking might cost their investors a fortune to keep the company alive.

It would be interesting enough to find out if hakia is also building their architecture upon the Hbase/Hadoop environment just like Powerset does. 

In the context of indexing and storing the world’s information it worth mentioning that there is yet another start-up search engine called Cuill that’s claiming to have invented a technology for cheaper and faster indexation than Google’s. Cuill claims that their indexing costs will be 1/10th of Google’s, based on new search architectures and relevance methods.

Speaking also for semantic textual analysis and presentation of meaningful results NosyJoe.com is a great example of both, yet it seems it is not going to index and store the world’s information and then apply the contextual analysis to, but rather than is focusing on what is quality and important for the people participating in their social search engine

A few months ago Hakia launched a new social feature called “Meet Others” It will give you the option, from a search results page, to jump to a page on the service where everyone who searches for the topic can communicate.

For some idealized types of searching, it could be great. For example, suppose you were searching for information on a medical condition. Meet Others could connect you with other people looking for info about the condition, making an ad-hoc support group. On the Meet Others page, you’re able to add comments, or connect directly with the people on the page via anonymous e-mail or by Skype or instant messaging.

On the other hand implementing social recommendations and relying on social elements like Hakia’s Meet the Others feature one needs to have huge traffic to turn that interesting social feature into an effective information discovery tool. For example Google with its more than 500 million unique searchers per month can easily beat such social attempts undergone by the smaller players if they only decide to employ, in one way or another, their users to find, determine the relevancy, share and recommend results others also search for. Such attempts by Google are already in place as one can read over here: Is Google trying to become a social search engine.

Reach

According to Quantcast, Hakia is basically not so popular site and is reaching less than 150,000 unique visitors per month. Compete is reporting much better numbers – slightly below 1 million uniques per month. Considering the fact the search engine is still in its beta stage these numbers are more than great. Analyzing further the traffic curve on both measuring sites above it appears that the traffic hakia gets is sort of campaign based, in other words generated due to advertising, promotion or PR activity and is not permanent organic traffic due to heavy usage of the site.

The People

Founded in 2004, hakia is a privately held company with headquarters in downtown Manhattan. hakia operates globally with teams in the United States, Turkey, England, Germany, and Poland.

The Founder of hakia is Dr. Berkan who is a nuclear scientist with a specialization in artificial intelligence and fuzzy logic. He is the author of several articles in this area, including the book Fuzzy Systems Design Principles published by IEEE in 1997. Before launching hakia, Dr. Berkan worked for the U.S. Government for a decade with emphasis on information handling, criticality safety and safeguards. He holds a Ph.D. in Nuclear Engineering from the University of Tennessee, and B.S. in Physics from Hacettepe University, Turkey. He has been developing the company’s semantic search technology with help from Professor Victor Raskin of PurdueUniversity, who specializes in computational linguistics and ontological semantics, and is the company’s chief scientific advisor.

Dr. Berkan resisted VC firms because he worried they would demand too much control and push development too fast to get the technology to the product phase so they could earn back their investment.

When he met Dr. Raskin, he discovered they had similar ideas about search and semantic analysis, and by 2004 they had laid out their plans.

They currently have 20 programmers working on building the system in New York, and another 20 to 30 contractors working remotely from different locations around the world, including Turkey, Armenia, Russia, Germany, and Poland.
The programmers are developing the search engine so it can better handle complex queries and maybe surpass some of its larger competitors.

Management

  • Dr. Riza C. Berkan, Chief Executive Officer
  • Melek Pulatkonak, Chief Operating Officer
  • Tim McGuinness, Vice President, Search
  • Stacy Schinder, Director of Business Intelligence
  • Dr. Christian F. Hempelmann, Chief Scientific Officer
  • John Grzymala, Chief Financial Officer

Board of Directors

  • Dr. Pentti Kouri, Chairman
  •  Dr. Riza C. Berkan, CEO
  • John Grzymala
  • Anuj Mathur, Alexandra Global Fund
  • Bill Bradley, former U.S. Senator
  • Murat Vargi, KVK
  • Ryszard Krauze, Prokom Investments

Advisory Board

  • Prof. Victor Raskin (Purdue University)
  • Prof. Yorick Wilks, (Sheffield University, UK)
  • Mark Hughes

Investors

Hakia is known to have raised $11 million in its first round of funding from a panoply of investors scattered across the globe who were attracted by the company’s semantic search technology.

The New York-based company said it decided to snub the usual players in the venture capital community lining Silicon Valley’s Sand Hill Road and opted for its international connections instead, including financial firms, angel investors, and a telecommunications company.

Poland

Among them were Poland’s Prokom Investments, an investment group active in the oil, real estate, IT, financial, and biotech sectors.

Turkey

Another investor, Turkey’s KVK, distributes mobile telecom services and products in Turkey. Also from Turkey, angel investor Murat Vargi pitched in some funding. He is one of the founding shareholders in Turkcell, a mobile operator and the only Turkish company listed on the New York Stock Exchange.

Malaysia

In Malaysia, hakia secured funding from angel investor Lu Pat Ng, who represented his family, which has substantial investments in companies worldwide.
From Finland, hakia turned to Dr. Pentti Kouri, an economist and VC who was a member of the Nokia board in the 1980s. He has taught at Stanford, Yale, New York University, and HelsinkiUniversity, and worked as an economist at the International Monetary Fund. He is currently based in New York.

United States

In the United States, hakia received funding from Alexandra Investment Management, an investment advisory firm that manages a global hedge fund. Also from the U.S., former Senator and New York Knicks basketball player Bill Bradley has joined the company’s board, along with Dr. Kouri, Mr. Vargi, Anuj Mathur of Alexandra Investment Management, and hakia CEO Riza Berkan.

Hakia was on of the first alternative search engine to make the home page of web 2.0 Innovations in the past year… http://web2innovations.com/hakia.com.php

Hakia.com is the 3rd Semantic App being featured by Web2Innovations in its series of planned publications [  ] where we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

The purpose of these publications is to discover and showcase today’s Semantic Web Apps and projects. We’re not going to rank them, because there is no way to rank these apps at this time – many are still in alpha and private beta.

Via

[ http://www.hakia.com/ ]
[ http://blog.hakia.com/ ]
[ http://www.hakia.com/about.html ]
[ http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php ]
[ http://www.readwriteweb.com/archives/hakia_meaning-based_search.php ]
[ http://siteanalytics.compete.com/hakia.com/?metric=uv ]
[ http://www.internetoutsider.com/2007/07/the-big-problem.html ]
[ http://www.quantcast.com/search/hakia.com ]
[ http://www.redherring.com/Home/19789 ]
[ http://web2innovations.com/hakia.com.php ]
[ http://www.pandia.com/sew/507-hakia.html ]
[ http://www.searchenginejournal.com/hakias-semantic-search-the-answer-to-poor-keyword-based-relevancy/5246/ ]
[ http://arstechnica.com/articles/culture/hakia-semantic-search-set-to-music.ars ]
[ http://www.news.com/8301-10784_3-9800141-7.html ]
[ http://searchforbettersearch.com/ ]
[ http://web2innovations.com/money/2007/12/01/is-google-trying-to-become-a-social-search-engine/ ]
[ http://www.web2summit.com/cs/web2006/view/e_spkr/3008 ]
 

Is Google trying to become a Social Search Engine

Based on what we are seeing the answer is close to yes. Google is now experimenting with new social features aimed at improving the users’ search experience.

This experiment lets you influence your search experience by adding, moving, and removing search results. When you search for the same keywords again, you’ll continue to see those changes. If you later want to revert your changes, you can undo any modifications you’ve made. Note that Google claims this is an experimental feature and may be available for only a few weeks.

There seems to be features like “Like it”, “Don’t like it?” and “Know of a better web page”. Of course, to get full advantage of these extras as well as to have your recommendations associated with your searches later, upon your return, you have to be signed in.

There is nothing new here, many of the smaller social search engines are deploying and using some of the features Google is just now trying to test, but having more than 500 million unique visitors per month, the vast majority of which are heavily using Google’s search engine, is a huge advantage if one wants to implement social elements in finding the information on web easily. Even Marissa Mayer, Google’s leading executive in search, said in August that Google would be well positioned to compete in social search. Actually with that experiment in particular it appears your vote only applies to what Google search results you will see, so it is hard to call it “social” at this time around. This may prove valuable as a stand-alone service. Also, Daniel Russell of Google, some time ago, made it pretty clear that they use user behavior to affect search results. Effectively, that’s using implicit voting, rather than explicit voting.

We think, however, the only reason Google is trying to deal with these social features, relying on humans to determine the relevancy, is their inability to effectively fight the spam their SERPs are flooded with. 

Manipulating algorithmic based results, in one way or another is in our understanding not much harder than what you would eventually be able to do to manipulate or influence results in Google that rely and depend on social recommendations. Look at Digg for example.

We think employing humans to determine which results are best is basically an effective pathway to corruption, which is sort of worse than to have an algorithm to blame for the spam and low quality of the results. Again take a look at Digg, dmoz.org and mostly Wikipedia. Wikipedia, once a good idea, became a battle field for corporate, brand, political and social wars. Being said that, we think the problem of Google with the spam results lies down to the way how they reach to the information or more concrete the methods they use to crawl and index the vast Web. Oppositely, having people, instead of robots, gathering the quality and important information (from everyone’s point of view) from around the web is in our understanding much better and effective approach rather than having all the spam results loaded on the servers and then let the people sort them out.

That’s not the first time Google is trying new features with their search results. We remember searchmash.com. Searchmash.com is yet another of the Google’s toys in the search arena, which was quietly started out a year ago because Google did not want the public to know about this project and influence their beta testers (read: the common users) with the brand name Google. The project, however, quickly became poplar since many people discovered who the actual owner of the beta project is.

Google is under no doubt getting all the press attention they need, no matter what they do and sometimes even more than what they do actually need from. On the other hand things seem to be slowly changing today and influential media like New York Times, Newsweek, CNN and many others are in a quest for the next search engine, the next Google. This was simply impossible to happen during 2001, 2002 up to 2004, period characterized with a solid media comfort for Google’s search engine business.  

So, is Google the first one to experiment with social search approaches, features, methods and extras? No, definitely not as you are going to see for yourself from the companies and projects listed below.

As for crediting a Digg-like system with the idea of sorting content out based on community voting, they definitely weren’t the first. The earliest implementation of this we are aware of is Kuro5hin.org (http://en.wikipedia.org/wiki/Kuro5hin), which, we think, was founded back in 1999.

Eurekster

One of the first and oldest companies coined social search engines on Web is Eureskter. 
Eurekster launched its community-powered social search platform “swicki”, as far as we know, in 2004, and explicit voting functionality in 2006. To date, over 100,000 swickis have been built, each serving a community of users passionate about a specific topic. Eurekster processes over 25,000,000 searches a month. The key to Eurekster’s success in improving relevancy here has been leveraging the explicit (and implicit) user behavior though at the group or community level, not individual or general. On the other hand Eurekster never made it to the mainstream users and somehow the company slowly faded away, lost the momentum.

Wikia Social Search

Wikia was founded by Jimmy Wales (Wikipedia’s founder) and Angela Beesley in 2004. The company is incorporated in Delaware. Gil Penchina became Wikia’s CEO in June 2006, at the same time the company moved its headquarters from St. Petersburg, Florida, to Menlo Park and later to San Mateo in California. Wikia has offices in San Mateo and New York in the US, and in Poznań in Poland. Remote staff is also located in Chile, England, Germany, Japan, Taiwan, and also in other locations in Poland and the US. Wikia has received two rounds of investment; in March 2006 from Bessemer Venture Partners and in December 2006 from Amazon.com.

According to the Wikia Search the future of Internet Search must be based on:

  1. Transparency – Openness in how the systems and algorithms operate, both in the form of open source licenses and open content + APIs.
  2. Community – Everyone is able to contribute in some way (as individuals or entire organizations), strong social and community focus.
  3. Quality – Significantly improve the relevancy and accuracy of search results and the searching experience.
  4. Privacy – Must be protected, do not store or transmit any identifying data.

Other active areas of focus include:

  1. Social Lab – sources for URL social reputation, experiments in wiki-style social ranking.
  2. Distributed Lab – projects focused on distributed computing, crawling, and indexing. Grub!
  3. Semantic Lab – Natural Language Processing, Text Categorization.
  4. Standards Lab – formats and protocols to build interoperable search technologies.

Based on who Jimmy Wales is and the success he achieved with Wikipedia therefore the resources he might have access to, Wikia Search stands at good chances to survive against any serious competition by Google.

NosyJoe.com

NosyJoe is yet another great example of social search engine that employs intelligent tagging technologies and runs on a semantic platform.

NosyJoe is a social search engine that relies on you to sniff for and submit the web’s interesting content and offers basically meaningful search results in the form of readable complete sentences and smart tags. NosyJoe is built upon the fundamental belief people are better than robots in finding the interesting, important and quality content around Web. Rather than crawling the entire Web building a massive index of information, which aside being an enormous technological task, requires huge amount of resources and is time consuming process would also load lots of unnecessary information people don’t want, NosyJoe is focused just on those parts of the Web people think are important and find interesting enough to submit and share with others.

NosyJoe is a hybrid of a social search engine that relies on you to sniff for and submit the web’s interesting content, an intelligent content tagging engine on the back end and a basic semantic platform on its web visible part. NosyJoe then applies a semantic based textual analysis and intelligently extracts the meaningful structures like sentences, phrases, words and names from the content in order to make it just one idea more meaningfully searchable. This helps us present the search results in basically meaningful formats like readable complete sentences and smart phrasal, word and name tags.

The information is then clustered and published across the NosyJoe’s platform into contextual channels, time and source categories and semantic phrasal, name and word tags are also applied to meaningfully connect them together, which makes even the smallest content component web visible, indexable and findable. At the end a set of algorithms and user patterns are applied to further rank, organize and share the information.

From our quick tests on the site the search results returned were presented in form of meaningful sentences and semantic phrasal tags (as an option), which turns their search results into — something we have never seen on web so far — neat content components, readable and easily understandable sentences, unlike what we are all used to, some excerpts from the content where the keyword is found in. When compared to other search engines’ results NosyJoe.com’s SERPs appear truly meaningful.

As of today, and just 6 or 7 months since they went online, NosyJoe is already having more than 500,000 semantic tags created that connect tens of thousands of meaningful sentences across their platform.

We have no information as to who stays behind NosyJoe but the project seems very serious and promising in many aspects from how they gather the information to how they present the results to the way they offset low quality results. From all newcomers social search engines NosyJoe stands at best changes to make it. As far as we know NosyJoe is also based in the Silicon Valley. 

Sproose

Sproose says it is developing search technology that lets users obtain personalized results, which can be shared among a social network, using the Nutch open-source search engine, and building applications on top. Their search appears to using third party search feeds and ranks the results based on the users’ votes.

Sproose is said it has raised nearly $1 million in seed funding. It is based in Danville, a town on the east side of the SF Bay Area. Sproose said Roger Smith, founder, former president and chief executive at Silicon Valley Bank, was one of the angel investors, and is joining Sproose’s board.

Other start-up search engines of great variety are listed below:

  • Hakia – Relies on natural language processing. These guys are also experimenting with social elements with the feature so called “meet others who asked the same query“.
  • Quintura – A visual engine based today in Virginia, US. The company is founded by Russians and has early been headquartered in Moscow. 
  • Mahalo – search engine that looks more like a directory with quality content handpicked by editors. Jason Calacanis is the founder of the company.
  • ChaCha – Real humans try to help you in your quest for information, via chat. The company is based in Indiana and has been criticized a lot by the Silicon Valley’s IT community. Despite these critics they have recently raised $10m in Series A round of funding. 
  • Powerset – Still in closed beta and also relying on understanding the natural language. See our Powerset review.  
  • Clusty – founded in 2000 by three Carnegie Mellon University scientists.
  • Lexxe – Sydney based engine featuring natural language processing technologies.
  • Accoona – The company has recently filed for an IPO in US planning to raise $80M from the public.
  • Squidoo – It has been started in October 2005 by Seth Godin and looks more like a wiki site, ala Wikia or Wikipedia where anyone creates articles on different topics.
  • Spock – Focuses on people information, people search engine.

One thing is for sure today; Google is now bringing solid credentials to and is somehow legitimating the social search approach, which by the way is helping those so many smaller so-called social search engines. 

Perhaps it is about time for consolidation in the social search sector. Some of the smaller but more promising social search engines can now become one in order to be able to compete with and prevent Google’s dominance within the social search sector too, just like what they did with the algorithmic search engines. Is Google also interested in? Anyone heard of recent interest in or already closed acquisition deals for start-up social search engines?

On the contrary, more and more IT experts, evangelists and web professionals agree on the fact that taking Google down is a challenge that will most likely be accomplished by a concept that is anything else but not a search engine in our traditional understanding. Such concepts, including but not limited to, are Wikipedia, Del.icio.us and LinkedWords. In other words finding information on web doesn’t necessarily mean to search for it.

Via:
[ http://www.google.com/experimental/a840e102.html ]
[ http://www.blueverse.com/2007/12/01/google-the-social-…]
[ http://www.adesblog.com/2007/11/30/google-experimenting-social… ]
[ http://www.techcrunch.com/2007/11/28/straight-out-of-left-field-google-experimenting-with-digg-style-voting-on-search-results ]
[ http://www.blogforward.com/money/2007/11/29/google… ]
[ http://nextnetnews.blogspot.com/2007/09/is-nosyjoecom-… ]
[ http://www.newsweek.com/id/62254/page/1 ]
[ http://altsearchengines.com/2007/10/05/the-top-10-stealth-… ]
[ http://www.nytimes.com/2007/06/24/business/yourmoney/…  ]
[ http://dondodge.typepad.com/the_next_big_thing/2007/05… ]
[ http://search.wikia.com/wiki/Search_Wikia ]
[ http://nosyjoe.com/about.com ]
[ http://www.siliconbeat.com/entries/2005/11/08/sproose_up_your… ]
[ http://nextnetnews.blogspot.com/2007/10/quest-for-3rd-generation… ]
[ http://www.sproose.com ]