Category Archives: Search Engines

Can Google lead amid its ever growing infrastructure and computation expenditures?

While reading our daily dose of news, stories and events from the web sector we came across an interesting fact worth reading and mentioning further. Google seems to be processing huge amounts of data per day in their daily routines – 20 Petabytes per day (20,000 Terabytes, 20M GBs).

The average MapReduce job is said to run across a $1 million hardware cluster, not including bandwidth fees, datacenter costs, or staffing. The January 2008 MapReduce paper provides new insights into Google’s hardware and software crunching processing tens of petabytes of data per day.

In September 2007, for example, the white paper document shows Googlers have made 2217 MapReduce jobs crunching approximately 11,000 machine years in a single month. Breaking these numbers further down shows that 11,081 machine years / (2217 job.s x 395 sec = .0278 years) implies 399,000 machines. Since this is believed to double about every 6 months one may guess Google are up to about 600,000 machines by now.

Google converted its search indexing systems to the MapReduce system in 2003, and currently processes over 20 terabytes of raw web data.

Google is known to run on hundreds of thousands of servers – by one estimate, in excess of 450,000 (data as of 2006, today more likely 600,000) – racked up in thousands of clusters in dozens of data centers around the world. It has data centers in Dublin, Ireland; in Virginia; and in California, where it just acquired the million-square-foot headquarters it had been leasing. It recently opened a new center in Atlanta, and is currently building two football-field-sized centers in The Dalles, Ore.

Microsoft, by contrast, made about a $1.5 billion capital investment in server and data structure infrastructure in 2006. Google is known to have spent at least as much to maintain its lead, following a $838 million investment in 2005. We estimate 2008’s Google IT expenditures to be in the $2B range. 

Google buys, rather than leases, computer equipment for maximum control over its infrastructure. Google chief executive officer Eric Schmidt defended that strategy once in a call conference with financial analysts. “We believe we get tremendous competitive advantage by essentially building our own infrastructures,” he said.

In general, Google has a split personality when it comes to questions about its back-end systems. To the media, its answer is, “Sorry, we don’t talk about our infrastructure.” Yet, Google engineers crack the door open wider when addressing computer science audiences, such as rooms full of graduate students whom it is interested in recruiting.

Among other things, Google has developed the capability to rapidly deploy prefabricated data centers anywhere in the world by packing them into standard 20- or 40-foot shipping containers.

Interesting fact from the Google’s history can be found back in 2003 when, in a paper, Google noted that power requirements of a densely packed server rack could range from 400 to 700 watts per square foot, yet most commercial data centers could support no more than 150 watts per square foot. In response, Google was investigating more power-efficient hardware, and reportedly switched from Intel to AMD processors for this reason. Google has not confirmed the choice of AMD, which was reported two years later by Morgan Stanley analyst Mark Edelstone.

Basically Google is mainly relying on its own internally developed software for data and network management and has a reputation for being skeptical of “not invented here” technologies, so relatively few vendors can claim it as a customer.

Google is being rumored that they would eventually start to build their own servers, storage systems, Internet switches and perhaps, sometime in the future, even optical transport systems.

Other rumors claim Google to be a big buyer of dark fiber to connect its data centers, which helps explain why the company spent nearly $3.8 billion over the past seven quarters on capital expenditures.

That’s tremendous amount of information and IT operations and based on our basic calculations, as far as we are correct in our human computation, it turns out that Google is facing IT expenditures in the $2B range per year, including for their data centers and the people.

Even though Google’s completive advantage is not only because of its infrastructure but also employees (Google has what is arguable the brightest group of people ever assembled for a publicly held company), proprietary software, global brand awareness, huge market capitalization and revenues of more than $10B per year, we think $2B burn rate per year on computing needs alone is “walking on thin ice” strategy at breakneck pace. Companies like Guill, who are claiming to have invented a technology 10 times cheaper than Google’s in terms of indexing and storing the information, Powerset working in hadoop/hbase environment, IBM, Microsoft and Yahoo! could potentially take an advantage over Google as Web grows further, so the Google’s computing expenses too.

Btw, we have also found on Web that Google processes its data on a standard machine cluster node consisting two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link.

Yahoo! and Powerset are known to use Hadoop while Microsoft’s equivalent is called Dryad. Dryad and Hadoop are the competing equivalent to Google’s GFS, MapReduce and the BigTable.

More about MapReduce

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Google’s implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day.

More about Hadoop

Hadoop is an interesting software platform that lets one easily write and run applications that process vast amounts of data. Here’s what makes Hadoop especially useful:

Scalable: Hadoop can reliably store and process petabytes.

Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.

Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.

Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS). MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters. Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch.

More about Dryad

Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming.

The Structure of Dryad Jobs
 
A Dryad programmer writes several sequential programs and connects them using one-way channels. The computation is structured as a directed graph: programs are graph vertices, while the channels are graph edges. A Dryad job is a graph generator which can synthesize any directed acyclic graph. These graphs can even change during execution, in response to important events in the computation.

Dryad is quite expressive. It completely subsumes other computation frameworks, such as Google’s map-reduce, or the relational algebra. Moreover, Dryad handles job creation and management, resource management, job monitoring and visualization, fault tolerance, re-execution, scheduling, and accounting.

More

http://doi.acm.org/10.1145/1327452.1327492
http://www.niallkennedy.com/blog/2008/01/google-mapreduce-stats.html
http://labs.google.com/papers/mapreduce.html
http://research.google.com/people/jeff/
http://research.google.com/people/sanjay/
http://research.microsoft.com/research/sv/dryad/
http://lucene.apache.org/hadoop/
http://labs.google.com/papers/gfs.html
http://labs.google.com/papers/bigtable.html
http://research.microsoft.com/research/sv/dryad/
http://www.techcrunch.com/2008/01/09/google-processing-20000-terabytes-a-day-and-growing/
http://feedblog.org/2008/01/06/mapreduce-simplified-data-processing-on-large-clusters/
http://en.wikipedia.org/wiki/MapReduce#Uses
http://open.blogs.nytimes.com/tag/hadoop/
http://www.baselinemag.com/print_article2/0,1217,a=182560,00.asp
http://www.stanford.edu/services/websearch/Google/
http://gigaom.com/2007/12/04/google-infrastructure/
http://gigaom.com/2005/09/19/google-asks-for-googlenet-bids/

Microsoft bets on enterprise search, offers to buy Fast.no for $1.2B

In what’s Microsoft’s second largest deal for the past 12 months the company offered to buy Fast Search & Transfer ASA, a leading provider of enterprise search solutions based in Norway. Details are as follows: Microsoft Corp. today announced that it will make an offer to acquire Fast Search & Transfer ASA (OSE: “FAST”), a leading provider of enterprise search solutions, through a cash tender offer for 19.00 Norwegian kroner (NOK) per share. This offer represents a 42 percent premium to the closing share price on Jan. 4, 2008 (the last trading day prior to this announcement), and values the fully diluted equity of FAST at 6.6 billion NOK (or approximately $1.2 billion U.S.).

FAST’s board of directors has unanimously recommended that its shareholders accept the offer. In addition, shareholders representing in aggregate 35 percent of the outstanding shares, including FAST’s two largest institutional shareholders, Orkla ASA and Hermes Focus Asset Management Europe, have irrevocably undertaken to accept the offer. The transaction is expected to be completed in the second quarter of calendar year 2008.

FAST has over 3500 enterprise clients, including heavyweights like Disney, The Washington Post, AutoTrader.com, and LexisNexis. According to Mary-Jo Foley from ZDNet, we should pay attention to how Microsoft will integrate FAST into their SharePoint Server. “Remember what Microsoft CEO Steve Ballmer said about SharePoint last year: He characterized SharePoint as the next big operating system from Microsoft,” she writes. “More and more, it’s looking like enterprise search functionality is one of the biggest reasons why.”

“Enterprise search is becoming an indispensable tool to businesses of all sizes, helping people find, use and share critical business information quickly,” said Jeff Raikes, president of the Microsoft Business Division. “Until now organizations have been forced to choose between powerful, high-end search technologies or more mainstream, infrastructure solutions. The combination of Microsoft and FAST gives customers a new choice: a single vendor with solutions that span the full range of customer needs.”

The companies possess a number of complementary strengths that advance a shared vision for helping businesses deliver information worker productivity and improved business results. FAST has a deep talent pool and is respected throughout the technology industry for its expertise in best-in-class, high-end search solutions. Microsoft offers worldwide customer reach and an extensive partner network, and is the recognized leader in business productivity with the popular Microsoft Office SharePoint Server, which combines search with best-in-class collaboration, business intelligence, portal and content management capabilities.

“This acquisition gives FAST an exciting way to spread our cutting-edge search technologies and innovations to more and more organizations across the world,” said John Lervik, CEO of FAST. “By joining Microsoft, we can benefit from the momentum behind the SharePoint business productivity platform to really empower a broader set of users through Microsoft’s strong sales and marketing network. It validates FAST’s momentum and leadership in enterprise search.”

In addition to bolstering Microsoft’s enterprise search efforts, this acquisition increases Microsoft’s research and development presence in Europe, complementing existing research teams in Cambridge, England, and Copenhagen, Denmark, with new and significant capabilities in Norway.

The offer will be subject to customary terms and conditions, including receipt of acceptances representing more than 90 percent of FAST shares and voting power on a fully diluted basis, and receipt of all necessary regulatory approvals on terms acceptable to Microsoft. The complete details of the offer, including all terms and conditions, will be contained in the offer document, which is expected to be sent to FAST shareholders during the week of Jan. 14, 2008. The offer will not be made in any jurisdiction in which the making of the offer would not be in compliance with the laws of such jurisdiction.

Larry Dignan, also from ZDNet, thinks this will lead the rest of the industry to consolidate the same way the advertising industry has been. “Until now organizations have been forced to choose between powerful, high-end search technologies or more mainstream, infrastructure solutions. The combination of Microsoft and FAST gives customers a new choice: a single vendor with solutions that span the full range of customer needs,” said Jeff Raikes, president of Microsoft’s Business Division.

More about FAST

FAST, which was founded in 1997, creates the real-time search and business intelligence solutions that are behind the scenes at the world’s best-known companies with the most demanding information challenges. FAST’s flexible and scalable integrated technology platform and personalized portal connects users, regardless of medium, to the relevant information they need.

FAST is headquartered in Norway and is publicly traded under the ticker symbol ‘FAST’ on the Oslo Stock Exchange. The FAST Group operates globally with presence in Europe, the United States, Asia, Australia, the Americas, and the Middle East. For further information about FAST, please visit http://www.fast.no/.

FAST’s Business is Enterprise Search. Since they have set up their company in Norway back in 1997, they have grown rapidly to become a global organization with offices across six continents. FAST is said to be the forefront of search technology and it knows how to do the heavy lifting, as they claim. 
 
Execution excellence
With over 3500 installations, many at Fortune 500 and Global 2000 companies, we have an illustrious pedigree. These blue-chip companies rely on us to help them achieve their business goals and they are loyal. If you ask our customers why they remain loyal, they will probably tell you how we exceed their expectations, provide an unparalleled level of service and show a demonstrable return on their investment. In many cases we have fundamentally contributed to their success.

In 2005 independent evaluations of our support organization gave us a 98% satisfaction rating. We get tested quarterly. In 2005 we delivered more than 300 successful customer projects on schedule and within budget. We also ran over 100 Search Best Practices workshops across the world with extremely positive feedback. It helps that more than 60% of our work force are engineers and that close to 50 of our engineers have PhDs in relevant fields. They enable us to meet complex needs by delivering simplicity.

Financial strength
We are the market leader in Enterprise Search and number one in revenue growth. We have no debt. We have been profitable, exceeding our projections, for every quarter during the last 4 years. And we have made these profits while investing a quarter of our income back into R&D. Performance like this gives us the freedom to invest in innovation and win on value and financial return.

Partner power
Partners give us the ability to deliver total solutions and our FAST X 10 partner program plays a major role in our success. We have over 90 Systems Integrators and VARs on board, and over 30 OEMs embedding our search technology. We have also certified close to 1000 developers in FAST University, drawing on our best-of-breed approach to partnering. Quantity is less important than quality, of course. We only pursue a partnership if there is a mutually beneficial, lasting opportunity.

Global presence
We have been a globally minded company, with a global outlook, since our inception. Maybe it is because of our Norwegian roots. In fact, soon after we opened our doors we established an office in the United States. We now have offices in 6 continents and development centers in 4 of them. Our products support close to 80 different languages.

John M. Lervik, Ph.D., serves as the Chief Executive Officer (CEO) and is a co-founder of FAST. Dr. Lervik served as the Company’s Chief Technology Officer from 1997 to September 2001 overlooking all of the company research and product development activities. Dr. Lervik holds a Ph.D. from the Norwegian University of Science and Technology, and was awarded the best overall PhD at NTNU in 1996/97.

Other co-founders of FAST are Mr. Thomas Joseph Fussell, who was a co-founder of Fast Search & Transfer ASA and has served as Executive Chairman of the Board of Directors since June 1997 and was Managing Director in 2000 and Mr. Robert Napier Keith, co-founded Fast Search & Transfer ASA and has served as Executive Director since June 1997.

Some people think this is a brilliant acquisition for Microsoft. Gartner says that Microsoft is struggling in this (already crowded) market. FAST is recognized as an industry-leader, along with Autonomy, Endeca, ZyLab, among others. 

The other thing to keep in mind is Microsoft’s biggest bet, which is its DYNAMICS (ERP/CRM) division. Because Business Objects was acquired by SAP, Microsoft possibly became more compelled to make an acquisition. Enterprise Search is going to be an absolutely massive component of ERP in the coming years, and this is a market that is strategic for Microsoft.

Fast.no seems to have some issues with its Board of Directors. More information enclosed below.

The conduct of Fast’s directors has been the subject of much comment in Norway. In Jan 2006 a article ran in the Norwegian IT paper that claimed that one of FAST’s directors Tomas Fussel had made a 2000% markup for himself by buying a loss making company Hercules communications and selling it to the public company Fast 3 weeks later for a massive mark up.

More recently there has been controversy at the board level with one director resigning and another making public statements about other directors and major shareholders. Fast’s board member Robert Keith said in a newspaper interview, “I ought to have seen the problems in Fast earlier. And I ought to have understood that Hans Gude Gudesen is a crazy liar. Also, I ought to have shot Oystein Stray Spetalen the first time I met him. That would have helped a lot of people, says the controversial Brit to the paper [Finansavisen].” Spetalen and Hans Gude Gudesen are both major shareholders in Fast. Furthermore directors Keith and Fussel are allegedly being pursued by the Norwegian tax authorities for $50M in unpaid taxes the government says it is owed by them. In the event of non payment liability may fall on the company. I should have shot Spetalen.

The ongoing turmoil has seen 3 directors resign from the board in the last month, the latest being Johan Fredrik Odfjell who is quoted in the company’s release as saying `FAST faces many challenges and opportunities going forward’

On December the 22nd Orka FAST’s largest shareholder demanded an EGM to force Fussel and Keith off the board

Need to Restate Accounts for 2006 and 2007

On the 12th of December 2007 Oslo Bors suspended trading of FAST shares. The next morning the company announced it was reviewing the accounting utilized for the 2006 and 2007 reports with a likely outcome that this would be changed. In an article titled “Fast restates its accounts” http://www.dagensit.no stated that Fasts results for 2006 and 2007 may be restated in what it called ”another clean up round.” It also stated “The Search technology vendor Fast Search & Transfer have had several rounds with restating of accounts. Also after CFO Joseph Lacson some months ago declared that “everything is cleaned up” one has found skeletons in the closet. Wednesday afternoon trading was suspended, after what the stock exchange called “certain conditions”.

Earlier last year FAST has acquired AgentArts, a San Francisco-based technology company with a personalization and recommendation engine for music, video, games and mobile entertainment. AgentArts clients include Infospace Mobile, Telstra Big Pond, Telstra Mobile, and Unipier. FAST said will add the technology to its enterprise search products, which will allow users to see the relationships between content and get recommendations for similar content based on their search patterns. It also includes a social recommendation feature, which helps users discover similar content based on patterns of other users with similar interests.

Although Fast Search & Transfer’s core business is widely known to be enterprise search, in 2007 the company seems to have sharply turned towards online advertising and search monetization, which seems the Web’s 2007 trend anyways, everybody is trying to become an ad company, platform or network. 

Also late last year (2007) FAST, which may be a company best known for specializing in site search, has launched a product platform that is looking to socialize the ecommerce storefront search function. It’s called FAST Recommendations and it is based on offering product recommendations similar to those of Amazon.com, but with a social twist.

If some of the information above proves to be true then this is a major, and in time, exit for the FAST’s shareholders.

More

http://www.fastsearch.com/
http://www.fast.no 
http://www.microsoft.com/presspass/press/2008/jan08/01-08FastSearchPR.mspx
http://www.forbes.com/prnewswire/feeds/prnewswire/2008/01/08/prnewswire200801080443PR_NEWS_USPR_____AQTU104.html
http://www.techcrunch.com/2008/01/08/microsoft-has-announced-a-takeover-bid-for-fast-search-transfer-priced-at-12-billion/
http://mashable.com/2008/01/08/microsoft-to-acquire-fast-search-transfer/
http://www.readwriteweb.com/archives/microsoft_fast_takeover.php
http://blogs.zdnet.com/microsoft/?p=1085
http://blogs.zdnet.com/BTL/?p=7518
http://www.microsoft.com/enterprisesearch/serverproducts/searchserverexpress/default.aspx
 

Google files patent for recognizing text in images

Google has filed a patent application in July 2007, which has just recently become public claiming methods where robots can read and understand text in images and video. The basic idea here is Google to be able to index videos and images and made them available and searchable by text or keywords located inside the image or the video. Aside Google Inc. the application was filed by Luc Vincent from Palo Alto, Calif. and Adrian Ulges from Delaware, US. The inventors are Luc Vincent and Adrian Ulges.

Digital images can include a wide variety of content. For example, digital images can illustrate landscapes, people, urban scenes, and other objects. Digital images often include text. Digital images can be captured, for example, using cameras or digital video recorders. Image text (i.e., text in an image) typically includes text of varying size, orientation, and typeface. Text in a digital image derived, for example, from an urban scene (e.g., a city street scene) often provides information about the displayed scene or location. A typical street scene includes, for example, text as part of street signs, building names, address numbers, and window signs.”

If Google manages to implement that technology the consumer search will be taken to the next level and Google would have an access to much wider array of information far beyond the text only search it already plays a leading role in.

This, of course, raises some additional privacy issues as being properly noted by InformationWeek. Gogole had already privacy issues with Google Maps Street View and if that technology starts to index and recognize textual information from millions of videos and billions of pictures around Web things might go more complicated.
 
Nonetheless if that technology bears the fruits it promises it will represent a gigantic leap forward in the progression of the general search technology.

There are open sources solutions to the problem. Perhaps not scalable and effective as it would be if Google develops it, yet they do exist.

Andrey Kucherenko from Ukraine is known to have made a very interesting project in that aspect. His classes can recognize text in monochrome graphical images after a training phase. The training phase is necessary to let the class build recognition data structures from images that have known characters. The training data structures are used during the recognition process to attempt to identify text in real images using the corner algorithm. His project is called PHPOCR and more information can be found over here.

PHPOCR have won the PHPClasses innovation awards of March 2006, and it shows the power of what could be implemented with PHP5. Certain types of applications require reading text from documents that are stored as graphical images. That is the case of scanned documents.

An OCR (Optical Character Recognition) tool can be used to recover the original text that is written in scanned documents. These are sophisticated tools that are trained to recognize text in graphical images.

This class provides a base implementation for an OCR tool. It can be trained to learn how to recognize each letter drawn in an image. Then it can be used to recognize longer texts in real documents.

Another very interesting start-up believed to be heavily deploying text recognition inside videos is CastTV. The company is based in San Francisco and over its just $3M in funding is trying to build one of the Web’s best video search engines. CastTV lets users find all their favorite online videos, from TV shows to movies to the latest celebrity, sports, news, and viral Internet videos. The company’s proprietary technology addresses two main video search challenges: finding and cataloging videos from the web and delivering relevant video results to users.

CastTV was one of the presenters at Techcrunch40 and was there noticed by Marissa Mayer from Google. She asked CastTV the following question: “Would like to know more about your matching algo for the video search engines?”. CastTV then replied: “We have been scaling as the video market grows – relevancy is a very tough problem – we are matching 3rd party sites and supplementing the meta data.”

Today we see Marissa’s question in the light of the patent application above and the context seems quite different and the answer from CastTV did not address Google’s concerns. Does CastTV work on something similar to what the patent is trying to cover for Google? We do not know but the time will tell. CastTV’s investors are Draper Fisher Jurvetson and Ron Conway. Hope they make a nice exit from CastTV.
 
Adobe has also some advances in that particular area. You can use Acrobat to recognize text in previously scanned documents that have already been converted to PDF. OCR (Optical Character Recognition) runs with header/footer/Bates number on image PDF files.

It is also interesting that Microsoft had, in fact, applied for a very similar patent (called “Visual and Multi-Dimensional Search“). Even more interesting here is the fact that MS had beaten Google to the punch by filing three days earlier – Microsoft filed on June 26, 2007, while Google filed on June 29.

Full abstract, description and claims can be read below:

More

http://google.com
http://www.wipo.int/pctdb/en/ia.jsp?IA=US2007072578&DISPLAY=STATUS
http://www.techmeme.com/080104/p23
http://www.techcrunch.com/2008/01/04/google-lodges-patent-for-reading-text-in-images-and-video/
http://www.webmasterworld.com/goog/3540344.htm
http://enterprise.phpmagazine.net/2006/04/recognize_text_objects_in_grap.html
http://www.phpclasses.org/browse/package/2874.html
http://www.crunchbase.com/company/casttv
http://www.casttv.com/
http://www.google.com/corporate/execs.html
http://www.centernetworks.com/techcrunch40-search-and-discovery
http://www.setthings.com/2008/01/04/recognizing-text-in-images-patent-by-google/
http://help.adobe.com/en_US/Acrobat/8.0/Professional/help.html?content=WS2A3DD1FA-CFA5-4cf6-B993-159299574AB8.html
http://www.techcrunch40.com/
http://www.therottenword.com/2008/01/microsoft-beats-google-to-image-text.html

Some of the web’s biggest acquisition deals during 2007

As the end of the year approaches us we would like to briefly sum up some of the web’s biggest acquisition deals for the 2007, as we know them. 

All deals will logically be ranked by their sizes and less weight will be put on the time the deal happened through out the year. Deals from all IT industry sectors are considered and put in the list, from Web and Internet to the Mobile industry as well. The size’s criterion for a deal to make the list is to be arguably no less than $100M unless the deal is symbolic in one way or another or either of the companies involved was popular enough at the time the deal took place. Otherwise we think all deals are important, at least for its founders and investors.

Under no doubt the year we will remember with the number of high-profile advertising company acquisitions for large-scale companies like DoubleClick, aQuantive, RightMedia, 24/7 Real Media, among others. Putting all acquisition deals aside, one particular funding deal deserves to be mentioned too Facebook raised $240 million from Microsoft in return of just 1.6% of its equity. The Honk Kong Billionaire Li Ka-shing later joined the club of high-caliber investors in Facebook by putting down $60M for unknown equity position.  

Other remarkable funding deals include: Alibaba.com raised $1.3 Billion from its IPO; Kayak raised $196 Million; Demand Media took $100 Million in Series C; Zillow totaled $87 Million in venture capital funding; Joost announced $45 million funding from Sequoia, Index, CBS & Viacom, among others. 

Yet another noteworthy deal is the Automattic (wordpress.org) turning down a $200 Million Acquisition Offer. 

And the 2007 Web 2.0 Money winner is… Navteq for its deal with Nokia for $8B. Apparently Microsoft has this year lost the crown of being named the deepest pocket buyer.

Nokia Buys Navteq For $8 Billion, Bets Big On Location-Based Services

Nokia (NOK), the Finnish mobile phone giant with nearly a third of the global handset market, has decided to bet big on location based services (LBS), and is buying Chicago-based digital map company NAVTEQ (NVT) for $8.1 billion. That works out to about $78 a share. This is one of Nokia’s largest purchases to date — the Finnish mobile giant has a mixed track record when it comes to acquisitions. This is also the second megabillion dollar buyout in the maps (LBS) space.

SAP Germany makes its biggest deal ever – acquires Business Objects for 4.8B EURO (around ~$6.8 billion)

SAP, the world’s largest maker of business software, has agreed to acquire Business Objects SA for €4.8 billion euros, which was around ~$6.8 billion at the time the acquisition deal was announced. The deal is amongst the largest for 2007 alongside with Oracle’s Hyperion deal for over $3.3B and the Nokia’s Navteq for over $8B. [more]

Microsoft to buy Web ad firm aQuantive for $6 Billion

Microsoft Corp. acquired aQuantive Inc. for about $6 billion, or $66.50 a share, an 85 percent premium to the online advertising company’s closing price at the time the deal was publicly announced. Shares of aQuantive shot to $63.95 in pre-opening trade, following news of the deal. The all-cash deal tops a dramatic consolidation spree across the online advertising market sparked when Google Inc. agreed to buy DoubleClick for $3.1 billion.

Oracle to buy Hyperion in $3.3 Billion cash deal

Oracle Corp. has acquired business intelligence software vendor Hyperion Solutions Corp. for $3.3 billion in cash. Oracle has agreed to pay $52 per share for Hyperion, or about $3.3 billion, a premium of 21% over Hyperion’s closing share price at the time of the deal. Oracle said it will combine Hyperion’s software with its own business intelligence (BI) and analytics tools to offer customers a broad range of performance management capabilities, including planning, budgeting and operational analytics.

Cisco Buys WebEx for $3.2 Billion

Cisco has agreed to acquire WebEx for $3.2 billion in cash. In 2006, WebEx generated nearly $50 million in profit on $380 million in revenue. They have $300 million or so in cash on hand, so the net deal value is $2.9 billion.

DoubleClick Acquired by Google For $3.1 Billion In Cash

Google reached an agreement to acquire DoubleClick, the online advertising company, from two private equity firms for $3.1 billion in cash, the companies announced, an amount that was almost double the $1.65 billion in stock that Google paid for YouTube late last year. In the last month for this year the US Federal Trade Commission has granted its approval for Google to purchase DoubleClick.

TomTom Bought Tele Atlas for $2.5 Billion

It took $2.5 Billion dollars for TomTom to buy mapping software company TeleAtlas, this will set the stage for TomTom to be big rival of Garmin across Atlantic. Tele Atlas went public in 2000 on the Frankfurt Stock Exchange, and last year, it bought another mapping firm, New Hampshire-based GDT.

Naspers acquires yet another European company – Tradus for roughly $1.8 Billion

Simply put a fallen dot com star with eBay ambitious, once worth more than 2B British pound (around $4B) and collapsed down to £62M at the end of 2000 is now being basically said rescued by the South African media company Naspers that is spending money at breakneck pace. The offered price is £946M (more than $1.8B) based on just £60M annual revenues. [more]

HP acquired Opsware For $1.6 Billion

HP has acquired IT Automation company Opsware for $1.6 billion. Whilst any acquisition of this size is interesting in itself, the back story to Opsware is even more so; Opsware was originally LoudCloud, a Web 1.0 company that took $350 million in funding during the Web 1.0 boom.

AOL acquired TradeDoubler for $900 Million

AOL has acquired Sweden-based TradeDoubler, a performance marketing company, for €695 million in cash, which was about US$900 million at the time the deal took place.

Microsoft acquired Tellme Networks for reportedly $800 Million

Microsoft Corp. has announced it will acquire Tellme Networks, Inc., a leading provider of voice services for everyday life, including nationwide directory assistance, enterprise customer service and voice-enabled mobile search. Although the price remains undisclosed, it is estimated to be upwards of $800 million.

Disney acquires Club Penguin for up to $700 Million

Club Penguin, a social network/virtual world that has been on the market for some time, was acquired by The Walt Disney Company. An earlier deal with Sony fell apart over the Club Penguin’s policy of donating a substantial portion of profits to charity. The company, which launched in October 2005, has 700,000 current paid subscribers and 12 million activated users, primarily in the U.S. and Canada.The WSJ says the purchase price is $350 million in cash. Disney could pay up to another $350 million if certain performance targets are reached over the next couple of years, until 2009.

Yahoo acquired RightMedia for $680 Million in cash and stock

Yahoo has acquired the 80% of advertising network RightMedia that it doesn’t already own for $680 million in cash and Yahoo stock. Yahoo previously bought 20% of the company in a $45 million Series B round of funding announced in October 2006. The company has raised over $50 million to date.

WPP Acquires 24/7 Real Media for $649 Million

Online advertising services firm 24/7 Real Media was acquired by the WPP group for $649 million. The old time internet advertising firm had its origins serving ads for Yahoo! and Netscape in 1994 and was formerly founded the following year as Real Media. After numerous acquisitions it took its current name and grew to have 20 offices in 12 countries, serving over 200 billion advertising impressions every month.

Google bought the web security company Postini for $625M

Google has acquired e-mail security company Postini for $625 million, a move intended to attract more large businesses to Google Apps. More than 1,000 small businesses and universities currently use Google Apps, but ‘there has been a significant amount of interest from large businesses,’ Dave Girouard, vice president and general manager of Google Enterprise, said in a Monday teleconference.

EchoStar Acquires Sling Media for $380 Million

EchoStar Communications Corporation, the parent company for DISH Network, has announced its agreement to acquire Sling Media, creator of the Sling suite, which lets you do things like control your television shows at any time, from their computers or mobile phones, or record and watch TV on your PC or Windows-based mobile phone. The acquisition is for $380 million.

ValueClick acquired comparison shopping operator MeziMedia for up to $352 Million

ValueClick has acquired MeziMedia for up to $352 million, in a deal consisting of $100 million in upfront in cash, with an additional sum of up to $252 million to be paid depending on MeziMedia’s revenue and earnings performance through to 2009.

Yahoo Acquires Zimbra For $350 Million in Cash

Yahoo has acquired the open source online/offline office suite Zimbra. The price: $350 million, in cash, confirmed. Zimbra gained wide exposure at the 2005 Web 2.0 Conference. Recently they has also launched an offline functionality.

Business.com Sells for $350 Million

Business.com has closed another chapter in its long journey from a $7.5 million domain name bought on a hope and a prayer, selling to RH Donnelley for $350 million (WSJ reporting up to $360 million). RH Donnelley beat out Dow Jones and the New York Times during the bidding.

AOL acquired online advertising company Quigo for $350 Million

AOL announced plans to buy Quigo and its services for matching ads to the content of Web pages. The acquisition follows AOL’s September purchase of Tacoda, a leader in behavioral-targeting technology, and comes as AOL tries to boost its online advertising revenue to offset declines in Internet access subscriptions.

eBay bought StubHub For $310 Million

eBay has acquired the San Francisco-based StubHub for $285 million plus the cash on StubHub’s books, which is about $25 million.

Yahoo! Agreed to acquire BlueLithium for approximately $300 Million in cash

Yahoo! Inc. has entered into a definitive agreement to acquire BlueLithium, one of the largest and fastest growing online global ad networks that offers an array of direct response products and capabilities for advertisers and publishers. Under the terms of the agreement, Yahoo! will acquire BlueLithium for approximately $300 million in cash.

CBS to buy social network Last.fm for $280 Million

CBS is known to have paid $280 million for the Last.fm site, which caters to music fans. CBS Corp bought the popular social networking website organized around musical tastes for $280 million, combining a traditional broadcast giant with an early leader in online radio. Last.fm, claims more than 15 million monthly users, including more than 4 million in the U.S.

AOL Acquired Tacoda, a behavior targeting advertising company for reportedly $275 Million

AOL has announced the acquisition of New York-based Tacoda earlier this year, a behavior targeting advertising company that was founded in 2001. The deal size, which we haven’t had confirmed, is likely far smaller than Microsoft’s $6 billion for aQuantive , Yahoo’s $680 million for RightMedia , or Google’s $3.1 billion for DoubleClick. The price might be low enough that it isn’t being disclosed at all.Jack Myers Media Business Report has confirmed the $275 million price tag

MySpace to acquire Photobucket For $250 Million

MySpace has acquired Photobucket for $250 million in cash. There is also an earn-out for up to an additional $50 million. Oddly enough MySapce has dropped Photobucket off its social networking platform. The dispute that led to the Photobucket videos being blocked on MySpace letter also led to acquisition discussions, and the block was removed. They have hired Lehman Brothers to help sell the company. They were looking for $300 million or more, but may have had few bidders other than MySpace.

Hitwise Acquired by Experian for $240M

Hitwise, the company that performs analysis of log files from 25 million worldwide ISP accounts to provide relative market share graphs for web properties, has been acquired by Experian for $240 million.

$200+ Million for Fandango

Comcast paid $200 million or perhaps a bit more. Fandango revenue is said to be in the $50m/year range, split roughly evenly between ticket sales and advertising. Wachovia Securities analyst Jeff Wlodarczak estimated the multiple-system operator paid $200 million for Fandango, whose backers include seven of the 10 largest U.S. movie exhibitors.

Intuit Acquires Homestead for $170 Million

Small business website creation service Homestead, started out in the web 1.0 era, announced tonight that it has been acquired by Intuit for $170m. In addition to Intuit’s personal and small business accounting software, and the company’s partnership with Google to integrate services like Maps listing and AdSense buys, Intuit customers will now presumably be able to put up websites quickly and easily with Homestead. [more]

Naspers Acquired Polish based IM Company Gadu Gadu (chit-chat) for reportedly $155 Million

South Africa’s biggest media group Naspers Ltd offered to buy all outstanding shares in Polish Internet firm Gadu Gadu S.A. ( GADU.WA ), a Polish IM service, for 23.50 zlotys ($8.77) per share. The current majority shareholder of Gadu Gadu has agreed to tender its 55% shareholding in the public tender offer. The price is $155M. [more] 

Studivz, a Germany Facebook clone, went for $132 Million

German Facebook clone Studivz has been sold to one of its investors, Georg von Holtzbrinck GmbH, a German publishing group, for €100 million (about $132 million). Other investors of Studivz include the Samwer brothers, founders of ringtone company Jamba (sold for €270M) and Alando (sold to eBay for €43M in 1999).

Feedburner goes to Google for $100 Million

Feedburner was acquired by Google for around $100 million. The deal is all cash and mostly upfront, according to sources, although the founders will be locked in for a couple of years.

Answers.com has purchased Dictionary.com for reportedly $100 Million

Question and answer reference site Answers.com has acquired Dictionary.com’s parent company, Lexico Publishing, for $100 million in cash. Lexico can really serve all your lexical needs because it also owns Thesaurus.com and Reference.com.

Yahoo Acquires Rivals for $100 Million

Yahoo has acquired college sports site Rivals.com, reported the Associated Press in a story earlier this year. The price is not being disclosed, although the rumor is that the deal was closed for around $100 million. Rumors of talks first surfaced in April 2007.

UGO Acquired By Hearst for reportedly $100 Million

Hearst has acquired New-York based UGO. Forbes reported the price should be around $100 million. UGO is a popular new media site that was founded in 1997 and, according to Forbes, is generating around $30 million/year in revenue. UGO media is yet another web 1.0 veteran and survivor.

Fotolog Acquired by Hi Media, French Ad Network for $90 Million
 
New York-based Fotolog been acquired by Hi Media, a Paris-based interactive media company for roughly $90 million – a combination of cash and stock, according to well-placed sources. 

Online Backup Startup Mozy Acquired By EMC For $76 Million

Online storage startup Mozy, headquartered in Utah, has been acquired by EMC Corporation, a public storage company with a nearly $40 billion market cap. EMC paid $76 million for the company, according to two sources close to the deal.

eBay Acquiring StumbleUpon for $75 Million

The startup StumbleUpon has been rumored to be in acquisition discussions since at least last November (2006). The small company had reportedly talks with Google, AOL and eBay as potential suitors. At the end of the day the start-up got acquired by eBay. The price was $75 million, which is symbolic with the fact the site had only 1.5m unique visitors per month at the time the deal took place. The company was rumored to be cash-positive.

General Atlantic Has Acquired Domain Name Pioneer Network Solutions

General Atlantic has acquired Network Solutions from Najafi Companies. Network Solutions was founded decades ago in 1973 and had a monopoly on domain name registration for years which led Verisign to pay billions to buy it. Najafi Companies purchased NS from VeriSign in November 2003 for just $100M. No financial terms were disclosed for the deal and no price tag is publicly available, although we believe it is way over $100M, but NS made our list due to its mythical role for the Internet’s development. That deal is symbolic for the Internet. 

MSNBC made its first acquisition in its 11-year history, acquired Newsvine

In a recent deal the citizen journalism startup Newsvine has been acquired by MSNBC, the Microsoft/NBC joint venture, for an undisclosed sum. Newsvine will continue operating independently, just as it has been since launching in March of 2006. The acquired company also indicated there would be little change in the features of the site.  We think the price tag for the Newsvine is anywhere in the $50/$75M range, but this is not confirmed. [more]

Google to buy Adscape for $23 Million

After some rumors of a deal earlier this year, Google has expanded its advertising reach by moving into video game advertising with their $23 million acquisition of Adscape.

Disney buys Chinese mobile content provider Enorbus for around $20 Million

Disney has bought Chinese mobile gaming company Enorbus , for around $20 million, MocoNews.net has learned. Financial backers in the company included Carlyle and Qualcomm Ventures.

BBC Worldwide Acquires Lonely Planet

BBC Worldwide, the international arm of BBC, has acquired Lonely Planet, the Australia-based travel information group. The amount of the deal was not disclosed, but Lonely Planet founders Tony and Maureen Wheeler get to keep a 25% share in the company. We truly believe this deal is in the $100M range, but since no confirmation was found on Web and therefore we cannot put a price tag for the sake of the list. Even though a global brand their site is getting just 4M unique visitors per month.

AOL Acquires ADTECH AG

AOL has acquired a controlling interest in ADTECH AG, a leading international online ad-serving company based in Frankfurt, Germany. The acquisition provides AOL with an advanced ad-serving platform that includes an array of ad management and delivery applications enabling website publishers to manage traffic and report on their online advertising campaigns. No details about the acquisition price were found on Web but we would suspect a large-scale deal and rank it very high. 

Amazon Acquires dpreview.com

Amazon have announced the acquisition of the digital camera information and review site dpreview.com. UK based dpreview.com was founded in 1998 by Phil Askey as a site that publishes “unbiased reviews and original content regarding the latest in digital cameras. Dpreview.com has in excess of 7 million unique viewers monthly. The value of the deal was not disclosed but we believe the purchase price should be in the $100M range (not confirmed).

HP Acquired Tabblo

HP announced the acquisition of Cambridge, Massachusetts based Photo printing site Tabblo this morning. The price was not disclosed.

eBay Gets Stake in Turkish Auction Market

eBay announced yesterday that it has acquired a minority stake in Turkish-based GittiGidiyor.com, an online marketplace structured in a similar manner to eBay. GittiGidiyor reportedly has more than 400,000 listings and 17 million users, which is a considerable percentage of the Turkish population. With the stake in GittiGidiyor, eBay now has the opportunity to enter the Turkish market via a system that’s already similar to theirs in functionality and purpose. Istanbul-based GittiGidiyor.com was founded in 2000. GittiGidiyor is Turkish for Going, Going, Gone. Terms of the deals were not found publicly available. Looking at the size of the Turkish site and the buying habits and history of eBay, the price should be considerably high, at least for the region.

Microsoft Acquiring ScreenTonic for Mobile Ad Platform

Microsoft is acquiring ScreenTonic, a local-based ads delivery platform for mobile devices, for an undisclosed amount. Paris-based ScreenTonic was founded in 2001, and has created the Stamp platform to deliver text or banner links on portals, text message ads and mobile web page ads, that vary depending on the recipients’ geographical location in a so called geo-targeting approach. 

~~~

Hakia takes $5M more, totals $16M

In a new round of funding Hakia, the natural language processing search engine has raised additional $5M. The money came from a previous investor, some of which are Noble Grossart Investments, Alexandra Investment Management, Prokom Investments, KVK, and several angel investors. With plans of fully launching some time next year, Hakia has been working towards improving its relevancy and adding some social features like “Meet the Others” to their site. Hakia is known to have raised $11 million in its first round of funding in late 2006 from a panoply of investors scattered across the globe who were attracted by the company’s semantic search technology. As far as we know, the company’s total funding is now $16M.

We think that from all alternative search engines, excluding Ask.com and Clusty.com, Hakia seems to be one of the most trafficked engines with almost 1M unique visitors as we last checked the site’s publicly available stats. If it is us to rank the most popular search engines I would put them the following way: Google, Yahoo, Ask.com, MSN, Naver, some other regional leaders, Clusty and perhaps somewhere there is hakia going.

On the other hand and according to Quantcast, Hakia is basically not so popular site and is reaching less than 150,000 unique visitors per month. Compete is reporting much better numbers – slightly below 1 million uniques per month. Considering the fact the search engine is still in its beta stage these numbers are more than great. However, analyzing further the traffic curve on both measuring sites above it appears that the traffic hakia gets is sort of campaign based, in other words generated due to advertising, promotion or PR activity and is not permanent organic traffic due to heavy usage of the site.

In related news a few days ago Google’s head of research Peter Norvig said that we should not expect to see natural-language search at Google anytime soon.

In a Q&A with Technology Review, he says:

We don’t think it’s a big advance to be able to type something as a question as opposed to keywords. Typing “What is the capital of France?” won’t get you better results than typing “capital of France.”

Yet he does acknowledge that there is some value in the technology:

We think (Google) what’s important about natural language is the mapping of words onto the concepts that users are looking for. To give some examples, “New York” is different from “York,” but “Vegas” is the same as “Las Vegas,” and “Jersey” may or may not be the same as “New Jersey.” That’s a natural-language aspect that we’re focusing on. Most of what we do is at the word and phrase level; we’re not concentrating on the sentence. We think it’s important to get the right results rather than change the interface.

In other words, a natural-language approach is useful on the back-end to create better results, but it does not present a better user experience. Most people are too lazy to type in more than one or two words into a search box anyway. The folks at both Google and Yahoo know that is true for the majority of searchers. The natural-language search startups are going to find out about that the hard way.

Founded in 2004, hakia is a privately held company with headquarters in downtown Manhattan. hakia operates globally with teams in the United States, Turkey, England, Germany, and Poland.

The Founder of hakia is Dr. Berkan who is a nuclear scientist with a specialization in artificial intelligence and fuzzy logic. He is the author of several articles in this area, including the book Fuzzy Systems Design Principles published by IEEE in 1997. Before launching hakia, Dr. Berkan worked for the U.S. Government for a decade with emphasis on information handling, criticality safety and safeguards. He holds a Ph.D. in Nuclear Engineering from the University of Tennessee, and B.S. in Physics from Hacettepe University, Turkey.

More

[ http://venturebeat.com/2007/12/12/hakia-raising-5m-for-semantic-search/ ]
[ http://mashable.com/2007/12/12/hakia-funded/ ]
[ http://www.hakia.com/ ]
[ http://blog.hakia.com/ ]
[ http://www.hakia.com/about.html ]
[ http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php ]
[ http://www.readwriteweb.com/archives/hakia_meaning-based_search.php ]
[ http://siteanalytics.compete.com/hakia.com/?metric=uv ]
[ http://www.internetoutsider.com/2007/07/the-big-problem.html ]
[ http://www.quantcast.com/search/hakia.com ]
[ http://www.redherring.com/Home/19789 ]
[ http://web2innovations.com/hakia.com.php ]
[ http://www.pandia.com/sew/507-hakia.html ]
[ http://www.searchenginejournal.com/hakias-semantic-search-the-answer-to-poor-keyword-based-relevancy/5246/ ]
[ http://arstechnica.com/articles/culture/hakia-semantic-search-set-to-music.ars ]
[ http://www.news.com/8301-10784_3-9800141-7.html ]
[ http://searchforbettersearch.com/ ]
[ https://web2innovations.com/money/2007/12/01/is-google-trying-to-become-a-social-search-engine/ ]
[ http://www.web2summit.com/cs/web2006/view/e_spkr/3008 ]
[ http://www.techcrunch.com/2007/12/18/googles-norvig-is-down-on-natural-language-search/ ]

Google is taking on Wikipedia

Once known as one of the strongest and beneficial friendships on the Web between two hugely popular and recognized giants is today going to turn out into an Internet battle second to none.

It is no secret on Web that Google was in love with Wikipedia over the past years turning this small and free encyclopedia project into one of the most visited sites on Web today with over 220 million unique visitors per month. It is believed that at least 85% of the total monthly traffic to Wikipedia is sent to by Google. One solid argument in support of that thesis can be the fact every second article on Wikipedia is being ranked among the first, if not the first, results in Google’s SERPs resulting in unprecedented organic traffic and participation.

It is also well known fact that Google wished they had the chance to acquire Wikipedia and if it was possible it’s believed they could have done this even years ago. Due to the non-profit policy and structure Wikipedia is built upon it provided no legal pathway to such deal for Google to snatch the site in its early days.

Basically one can conclude that Google has always liked the idea and concept upon which Wikipedia is built up and since, due to obvious reasons, they were not able to buy the site they seem today are up to an idea dangerously similar to the Wikipedia and are obviously taking on the free encyclopedia.

News broke late yesterday that Google is in preparation to launch a new site called Knol to create a new user generated authoritative online knowledgebase of virtually everything.

Normally we would not pay attention on such type of news where a large-scale corporation is trying to copy/cat an already popular and established business model (concept) that did not turn into a large-scale company itself. This is happening all the time and is part of the modern capitalism except we found a couple of strategic facts that provoked us to express our opinion.

First of all the mythical authority and popularity of Wikipedia seems to be under attack and unlike any of the other attempts encountered before this time it is Google, a company that is possessing a higher degree of chance to make it happen, undermining Wikipedia despite its huge popularity and idealistic approach today.

A couple of weeks ago we have written an in-depth analysis how yet another mythical site Dmoz.org has fallen down and is on its half way to totally disintegrate itself and the only reason behind this trend we have found is the voluntary approach and principle the site relied ever since – almost 10 years of existence.

We think the same problem is endangering Wikipedia too and perhaps it is just matter of time we witness how the hugely popular free encyclopedia today will some day in the future start disintegrating the same way it happened to Dmoz.org due to the same reason – it hugely relies on and is heavily dependant upon the voluntary principle and the contribution of thousands of skilled and knowledgeable individuals. However we all know there is no free lunch, at least not in America. And once Wikipedia has its mythical image, today everyone wants to be associated with, lost and is no longer passing authority and respect on to its free knowledgeable contributors the free encyclopedia will then most likely start disintegrating and what’s today known to be an authoritative and high-quality knowledge data base will then become one of the biggest repository of low-quality and link rich articles of controversial and objectable information on the Web. Pretty much the same has already happened to Dmoz.org. The less the Wikipedia volunteers become interested to keep contributing their time and knowledge to the free site while fighting with an ever growing army of spammers and corporate PRs the more the low-quality and less authoritative information on the Wikipedia will grow to and that process appears unavoidable.

This is what Google seems to be up to and is looking forward to change. Google wants to compensate those knowledgeable contributors on a long term run that way avoid a potential crash in the future, which is unavoidable for every free-based service on the planet that had the luck to grow out of size. 

Having more than $10 billion in annual sales (most of it represents pure profit), and willingness to share that money with these knowledgeable people around the globe, as well as relying on more than 500 million unique visitors per month Google seems to be on the right track to achieve what Wikipedia will most likely fail at.

Otherwise Wikipedia is a greater idea than Google itself but anything the size and ambitious of Wikipedia today does require an enormous amount of resources to keep alive, under control and effectively working for the future. Wikipedia has been trying to raise money for a long time now with no viable success. On the other hand, Google has already these resources in place.

Google has already said that Knol results will be in Google’s index, presumably on the first page, and very possibly at the top: “Our job in Search Quality will be to rank the knols appropriately when they appear in Google search results.” Google wants Knol to be an authoritative page: “A knol on a particular topic is meant to be the first thing someone who searches for this topic for the first time will want to read” and that’s already a direct challenge to Wikipedia.

If Wikipedia is being replaced in the first top results on Google with pages from Knol respectively, Wikipedia traffic will definitely decrease, and possibly as a consequence so will broader participation on Wikipedia.

Will Knol be the answer of the Web of Knowledge everybody is looking for? We do not know but one is for sure today it is going to hurt Wikipedia and not the ordinary user of the aggregated knowledge base Wikipedia is. The entire army of both users and contributors will possibly move to Knol, for longer, or at least until Google finds ways to pay for the knowledge aggregation and its contributors.

Other companies that will eventually get hurt are as follows: Freebase, About.com, Wikia, Mahalo and Squidoo.

Below is a screenshot of the Knol’s reference page and how it would eventually look like:


More

[ http://www.google.com/help/knol_screenshot.html ]
[ http://googleblog.blogspot.com/2007/12/encouraging-people-to-contribute.html ]
[ http://www.techcrunch.com/2007/12/13/google-preparing-to-launch-game-changing-wikipedia-meets-squidoo-project/ ]
[ http://www.techcrunch.com/2007/12/14/google-knol-a-step-too-far/ ]
[ http://www.readwriteweb.com/archives/knol_project_google_experiment.php ]
[ http://www.webware.com/8301-1_109-9834175-2.html?part=rss&tag=feed&subj=Webware ]
 [ http://searchengineland.com/071213-213400.php ]
[ http://www.news.com/Google-develops-Wikipedia-rival/2100-1038_3-6222872.html ]
[ http://www.micropersuasion.com/2007/12/wikipedia-and-w.html ]
 

Hakia takes on major search engines backed up by a small army of international investors

In our planned series of publications about the Semantic Web and its Apps today Hakia is our 3rd featured company.

Hakia.com, just like Freebase and Powerset is also heavily relying on Semantic technologies to produce and deliver hopefully better and meaningful results to its users.

Hakia is building the Web’s new “meaning-based” (semantic) search engine with the sole purpose of improving search relevancy and interactivity, pushing the current boundaries of Web search. The benefits to the end user are search efficiency, richness of information, and time savings. The basic promise is to bring search results by meaning match – similar to the human brain’s cognitive skills – rather than by the mere occurrence (or popularity) of search terms. Hakia’s new technology is a radical departure from the conventional indexing approach, because indexing has severe limitations to handle full-scale semantic search.

Hakia’s capabilities will appeal to all Web searchers – especially those engaged in research on knowledge intensive subjects, such as medicine, law, finance, science, and literature. The mission of hakia is the commitment to search for better search.

Here are the technological differences of hakia in comparison to conventional search engines.

QDEX Infrastructure

  • hakia’s designers broke from decades-old indexing method and built a more advanced system called QDEX (stands for Query Detection and Extraction) to enable semantic analysis of Web pages, and “meaning-based” search. 
  • QDEX analyzes each Web page much more intensely, dissecting it to its knowledge bits, then storing them as gateways to all possible queries one can ask.
  • The information density in the QDEX system is significantly higher than that of a typical index table, which is a basic requirement for undertaking full semantic analysis.
  • The QDEX data resides on a distributed network of fast servers using a mosaic-like data storage structure.
  • QDEX has superior scalability properties because data segments are independent of each other.

SemanticRank Algorithm

  • SemanticRank algorithm of hakia is comprised of innovative solutions from the disciplines of Ontological Semantics, Fuzzy Logic, Computational Linguistics, and Mathematics. 
  • Designed for the expressed purpose of higher relevancy.
  • Sets the stage for search based on meaning of content rather than the mere presence or popularity of keywords.
  • Deploys a layer of on-the-fly analysis with superb scalability properties.
  • Takes into account the credibility of sources among equally meaningful results.
  • Evolves its capacity of understanding text from BETA operation onward.

In our tests we’ve asked Hakia three English-language based questions:

Why did the stock market crash? [ http://www.hakia.com/search.aspx?q=why+did+the+stock+market+crash%3F ]
Where do I get good bagels in Brooklyn? [ http://www.hakia.com/search.aspx?q=where+can+i+find+good+bagels+in+brooklyn ]
Who invented the Internet? [ http://www.hakia.com/search.aspx?q=who+invented+the+internet ]

It basically returned intelligent results for all. For example, Hakia understood that, when we asked “why,” I would be interested in results with the words “reason for”–and produced some relevant ones. 

Hakia  is one of the few promising Alternative Search Engines as being closely watched by Charles Knight at his blog AltSearchEngines.com, with a focus on natural language processing methods to try and deliver ‘meaningful’ search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. Most other major search engines, including Google, analyze keywords. The company believes that the future of search engines will go beyond keyword analysis – search engines will talk back to you and in effect become your search assistant. One point worth noting here is that, currently, Hakia still has some human post-editing going on – so it isn’t 100% computer powered at this point and is close to human-powered search engine or combination of the two.

They hope to provide better search results with complex queries than Google currently offers, but they have a long way to catch up, considering Google’s vast lead in the search market, sophisticated technology, and rich coffers. Hakia’s semantic search technology aims to understand the meaning of search queries to improve the relevancy of the search results.

Instead of relying on indexing the web or on the popularity of particular web pages, as many search engines do, hakia tries to match the meaning of the search terms to mimic the cognitive processes of the human brain.

“We’re mainly focusing on the relevancy problem in the whole search experience,” said Dr. Berkan in an interview Friday. “You enter a question and get better relevancy and better results.”

Dr. Berkan contends that search engines that use indexing and popularity algorithms are not as reliable with combinations of four or more words since there are not enough statistics available on which to base the most relevant results.

“What we are doing is an ultimate approach, doing meaning-based searches so we understand the query and the text, and make an association between them by semantic analysis,” he said.

Analyzing whole sentences instead of keywords would indefinitely increase the cost to the company to index and process the world’s information. The case is pretty much the same with Powerset where they are also doing deep contextual analysis on every sentence on every web page and is publicly known fact they have higher cost for indexing and analyzing than Google. Taking into consideration that Google is having more than 450,000 servers in several major data centers and hakia’s indexing and storage costs might be even higher the approach they are taking might cost their investors a fortune to keep the company alive.

It would be interesting enough to find out if hakia is also building their architecture upon the Hbase/Hadoop environment just like Powerset does. 

In the context of indexing and storing the world’s information it worth mentioning that there is yet another start-up search engine called Cuill that’s claiming to have invented a technology for cheaper and faster indexation than Google’s. Cuill claims that their indexing costs will be 1/10th of Google’s, based on new search architectures and relevance methods.

Speaking also for semantic textual analysis and presentation of meaningful results NosyJoe.com is a great example of both, yet it seems it is not going to index and store the world’s information and then apply the contextual analysis to, but rather than is focusing on what is quality and important for the people participating in their social search engine. 

A few months ago Hakia launched a new social feature called “Meet Others” It will give you the option, from a search results page, to jump to a page on the service where everyone who searches for the topic can communicate.

For some idealized types of searching, it could be great. For example, suppose you were searching for information on a medical condition. Meet Others could connect you with other people looking for info about the condition, making an ad-hoc support group. On the Meet Others page, you’re able to add comments, or connect directly with the people on the page via anonymous e-mail or by Skype or instant messaging.

On the other hand implementing social recommendations and relying on social elements like Hakia’s Meet the Others feature one needs to have huge traffic to turn that interesting social feature into an effective information discovery tool. For example Google with its more than 500 million unique searchers per month can easily beat such social attempts undergone by the smaller players if they only decide to employ, in one way or another, their users to find, determine the relevancy, share and recommend results others also search for. Such attempts by Google are already in place as one can read over here: Is Google trying to become a social search engine.

Reach

According to Quantcast, Hakia is basically not so popular site and is reaching less than 150,000 unique visitors per month. Compete is reporting much better numbers – slightly below 1 million uniques per month. Considering the fact the search engine is still in its beta stage these numbers are more than great. Analyzing further the traffic curve on both measuring sites above it appears that the traffic hakia gets is sort of campaign based, in other words generated due to advertising, promotion or PR activity and is not permanent organic traffic due to heavy usage of the site.

The People

Founded in 2004, hakia is a privately held company with headquarters in downtown Manhattan. hakia operates globally with teams in the United States, Turkey, England, Germany, and Poland.

The Founder of hakia is Dr. Berkan who is a nuclear scientist with a specialization in artificial intelligence and fuzzy logic. He is the author of several articles in this area, including the book Fuzzy Systems Design Principles published by IEEE in 1997. Before launching hakia, Dr. Berkan worked for the U.S. Government for a decade with emphasis on information handling, criticality safety and safeguards. He holds a Ph.D. in Nuclear Engineering from the University of Tennessee, and B.S. in Physics from Hacettepe University, Turkey. He has been developing the company’s semantic search technology with help from Professor Victor Raskin of PurdueUniversity, who specializes in computational linguistics and ontological semantics, and is the company’s chief scientific advisor.

Dr. Berkan resisted VC firms because he worried they would demand too much control and push development too fast to get the technology to the product phase so they could earn back their investment.

When he met Dr. Raskin, he discovered they had similar ideas about search and semantic analysis, and by 2004 they had laid out their plans.

They currently have 20 programmers working on building the system in New York, and another 20 to 30 contractors working remotely from different locations around the world, including Turkey, Armenia, Russia, Germany, and Poland.
The programmers are developing the search engine so it can better handle complex queries and maybe surpass some of its larger competitors.

Management

  • Dr. Riza C. Berkan, Chief Executive Officer
  • Melek Pulatkonak, Chief Operating Officer
  • Tim McGuinness, Vice President, Search
  • Stacy Schinder, Director of Business Intelligence
  • Dr. Christian F. Hempelmann, Chief Scientific Officer
  • John Grzymala, Chief Financial Officer

Board of Directors

  • Dr. Pentti Kouri, Chairman
  •  Dr. Riza C. Berkan, CEO
  • John Grzymala
  • Anuj Mathur, Alexandra Global Fund
  • Bill Bradley, former U.S. Senator
  • Murat Vargi, KVK
  • Ryszard Krauze, Prokom Investments

Advisory Board

  • Prof. Victor Raskin (Purdue University)
  • Prof. Yorick Wilks, (Sheffield University, UK)
  • Mark Hughes

Investors

Hakia is known to have raised $11 million in its first round of funding from a panoply of investors scattered across the globe who were attracted by the company’s semantic search technology.

The New York-based company said it decided to snub the usual players in the venture capital community lining Silicon Valley’s Sand Hill Road and opted for its international connections instead, including financial firms, angel investors, and a telecommunications company.

Poland

Among them were Poland’s Prokom Investments, an investment group active in the oil, real estate, IT, financial, and biotech sectors.

Turkey

Another investor, Turkey’s KVK, distributes mobile telecom services and products in Turkey. Also from Turkey, angel investor Murat Vargi pitched in some funding. He is one of the founding shareholders in Turkcell, a mobile operator and the only Turkish company listed on the New York Stock Exchange.

Malaysia

In Malaysia, hakia secured funding from angel investor Lu Pat Ng, who represented his family, which has substantial investments in companies worldwide.
From Finland, hakia turned to Dr. Pentti Kouri, an economist and VC who was a member of the Nokia board in the 1980s. He has taught at Stanford, Yale, New York University, and HelsinkiUniversity, and worked as an economist at the International Monetary Fund. He is currently based in New York.

United States

In the United States, hakia received funding from Alexandra Investment Management, an investment advisory firm that manages a global hedge fund. Also from the U.S., former Senator and New York Knicks basketball player Bill Bradley has joined the company’s board, along with Dr. Kouri, Mr. Vargi, Anuj Mathur of Alexandra Investment Management, and hakia CEO Riza Berkan.

Hakia was on of the first alternative search engine to make the home page of web 2.0 Innovations in the past year… http://web2innovations.com/hakia.com.php

Hakia.com is the 3rd Semantic App being featured by Web2Innovations in its series of planned publications [  ] where we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

The purpose of these publications is to discover and showcase today’s Semantic Web Apps and projects. We’re not going to rank them, because there is no way to rank these apps at this time – many are still in alpha and private beta.

Via

[ http://www.hakia.com/ ]
[ http://blog.hakia.com/ ]
[ http://www.hakia.com/about.html ]
[ http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php ]
[ http://www.readwriteweb.com/archives/hakia_meaning-based_search.php ]
[ http://siteanalytics.compete.com/hakia.com/?metric=uv ]
[ http://www.internetoutsider.com/2007/07/the-big-problem.html ]
[ http://www.quantcast.com/search/hakia.com ]
[ http://www.redherring.com/Home/19789 ]
[ http://web2innovations.com/hakia.com.php ]
[ http://www.pandia.com/sew/507-hakia.html ]
[ http://www.searchenginejournal.com/hakias-semantic-search-the-answer-to-poor-keyword-based-relevancy/5246/ ]
[ http://arstechnica.com/articles/culture/hakia-semantic-search-set-to-music.ars ]
[ http://www.news.com/8301-10784_3-9800141-7.html ]
[ http://searchforbettersearch.com/ ]
[ https://web2innovations.com/money/2007/12/01/is-google-trying-to-become-a-social-search-engine/ ]
[ http://www.web2summit.com/cs/web2006/view/e_spkr/3008 ]
 

Microsoft acquires discount shopping search Jellyfish

A couple of months ago Microsoft did an interesting move. They acquired Jellyfish.com – the Internet’s first buying [search] engine, as they call themselves.  Simply put: online discount shopping website that shares their fees earned from the merchants when you buy from them through cash back program.

Typical for how the major companies buy the price of the acquisition was not disclosed nor were more business details given. Under the terms of the deal, Jellyfish.com will maintain its standalone identity and its 26 employees will remain in Wisconsin.

Jellyfish.com had raised about $6 million in funding from investors that included company executives and Kegonsa Capital Partners, based in Fitchburg, Wisconsin and Clyde Street in October 2006.

Jellyfish.com was co-founded by Chief Executive Brian Wiegand and President Mark McGuire, who previously collaborated on NameProtect, a vertical search engine that provides trademark research. Venture-backed NameProtect was acquired by Corporation Services Company in April 2007.

What is Jellyfish.com anyway?

Jellyfish is a new kind of search engine. They call it the Internet’s first buying engine. Search engines are great for finding information, but they think you also need a search engine that is perfect for when you want to buy something online.

They try to make it simple for you to find the right product from a trusted merchant. But they also do something really different too: sharing their revenue with you. The guys there think of themselves as a Robin-Hood-like search engine that takes a percentage of the revenue you generate through your buying activity and redistributes it to you.
You use Jellyfish.com just like you would any other shopping search engine to find the right product at the best price. But when you actually buy something from a store in our engine, we share at least half of what we earn by connecting you to that store. All you need to do is sign up for an account to earn cash back. There are no fees or hidden charges.

This is the Jellyfish.com’s cash back promise: to share at least half of every $1 they earn when you shop and buy products using Jellyfish.com, as of course not all merchants within their data base are allowing them to share with shoppers, but this is clearly indicated.
At Jellyfish you will never get hidden fees, secret agendas, or annoying advertising. You will get an easy to use, transparent service that puts you in control.

Like eBay in Reverse

In reality, Jellyfish.com is one big marketplace of stores competing for your attention. But instead of annoying you with advertising, we allow stores to use their advertising dollars to lower your end price. If you like pretty pictures, you can see a picture of how this works here. And no we aren’t eBay, but we think our patent-pending marketplace is like eBay in reverse. Instead of bidding for deals, all you have to do is search to uncover the stores that have already bid the most to create the best deal for you.

How can they do this? Or better yet, why they are giving away $?

They just think that advertising stinks. Instead of wasting lots of money interrupting and annoying you, they have invented a new marketplace where stores make their advertising $’s work directly for your benefit and on your terms. Current advertising gives too much value to search engines at the expense of you and the stores that pay to advertise. Instead of the search engine keeping all of the advertising, we set up a system that rewards us, you, and the advertiser fairly when you find the right product to buy online.

What they really hope to do is show you the value of your attention online. And they couldn’t think of a better way than paying you cold hard cash. Technology has given you incredible control of what you pay attention to. You may not know it yet, but you are now in control. Companies in this new world will have to provide you with a maximum return on the value of your attention or they will die. And the value of your attention at Jellyfish is measured in extra dollars in your cash back account.

At Jellyfish, they want to pioneer a new form of search advertising that they call Value Per Action. Instead of charging fees when you click, they charge their advertisers only when you actually buy, and they share at least half of this fee back to you as cash back. In other words, they connect you directly to the value of the advertising. Instead of measuring how much money they make when you click, they measure how much value the advertiser is willing to pay YOU for your sale. With VPA, the advertising value of your attention becomes transparent (you can see it in the form of cash back) and changes from annoying advertising into something that actually lowers your end price.

Jellyfish.com’s platform is a sort of reverse auction where buyers bid on reducing prices, betting on when to place an order without knowing quantity at the given price.

This type of auction is a dutch auction, first used to sell Dutch tulips.

The Microsoft Live Search team said  they “think the technology has some interesting potential applications as we continue to invest heavily in shopping and commerce as a key component of Live Search.”

Another potential reason could be Google, again.

Google understands the game of pay per click is about to change and is moving. Microsoft pays attention to is and they’re locking up intellectual property in this move -one that combines multiple, successful and innovative digital shopping models.

Jellyfish takes a best of breed approach and “mashes them up” to the amusement of consumers: Ebates + Woot.com and on the advertiser-side, eBay’s Shopping.com + Google’s AdWords auction environment + Commission Junction’s (VCLK) performance-based cost model (cost-per-action) with a twist of Google (auctioning off ads).

It all ads up to valuable IP that Google, in theory, cannot access.

According to Jellyfish’s zeitgeist, pay per click advertising “fails to align incentives properly between the consumer, the advertiser, and the search engine intermediary connecting them.” It’s certainly an interesting take on sponsored links, but it will most likely be a complicated stance to maintain after being acquired by one of the larger players in the pay per click game.

Similar, and older, companies include Shopping (eBay), Bizrate.com, Epinions and Overstock.com.

Via

[ http://www.jellyfish.com/about ]
[ http://www.techcrunch.com/2007/10/02/microsoft-acquires-discount-shopping-site-jellyfishcom/ ]
[ http://www.jeffmolander.com/ ]
[ http://www.techcrunch.com/2006/10/27/cpa-shopping-search-jellyfishcom-closes-5-million-round/ ]
[ http://www.jellyfish.com/blog ]
[ http://www.jellyfish.com/howToUseJellyfish ]
[ http://www.jellyfish.com/blog/2007/10/02/microsoft-acquires-jellyfish/ ]
[ http://blog.wired.com/business/2007/10/microsoft-acqui.html ]
[ http://www.jellyfish.com/ourVision ]
[ http://www.marketingpilgrim.com/2007/10/microsoft-acquires-jellyfish-apparently-shuns-peanutbutterfish.html ]
[ http://blogs.msdn.com/livesearch/archive/2007/10/01/microsoft-acquires-jellyfish-com.aspx ]
[ http://www.redherring.com/Home/22913 ]
[ http://www.jellyfish.com/founders ]

Powerset – the natural language processing search engine empowered by Hbase in Hadoop

In our planned series of publications about the Semantic Web and its apps today Powerset is going to be our second company, after Freebase, to be featured. 

Powerset is a Silicon Valley based company building a transformative consumer search engine based on natural language processing. Their unique innovations in search are rooted in breakthrough technologies that take advantage of the structure and nuances of natural language. Using these advanced techniques, Powerset is building a large-scale search engine that breaks the confines of keyword search. By making search more natural and intuitive, Powerset is fundamentally changing how we search the web, and delivering higher quality results.

Powerset’s search engine is currently under development and is closed for the general public. You can always keep an eye on them in order to learn more information about their technology and approach.

Despite all the press attention Powerset is gaining there are too few details publicly available for the search engine. In fact Powerset is lately one of the most buzzed companies in the Silicon Valley, for good or bad.

Power set is a term from the mathematics and means a set S, the power set (or powerset) of S, written P(S) P(S), or 2S, is the set of all subsets of S. In axiomatic set theory (as developed e.g. in the ZFC axioms), the existence of the power set of any set is postulated by the axiom of power set. Any subset F of P(S), is called a family of sets over S.

From the latest information publicly available for Powerset we learn that, just like some other start-up search engines, they are also using Hbase in Hadoop environment to process vast amounts of data.

It also appears that Powerset relies on a number of proprietary technologies such as the XLE, licensed from PARC, ranking algorithms, and the ever-important onomasticon (a list of proper nouns naming persons or places).

  

For any other component, Powerset tries to use open source software whenever available. One of the unsung heroes that form the foundation for all of these components is the ability to process insane amounts of data. This is especially true for a Natural Language search engine. A typical keyword search engine will gather hundreds of terabytes of raw data to index the Web. Then, that raw data is analyzed to create a similar amount of secondary data, which is used to rank search results. Since Powerset’s technology creates a massive amount of secondary data through its deep language analysis, Powerset will be generating far more data than a typical search engine, eventually ranging up to petabytes of data.
Powerset has already benefited greatly from the use of Hadoop: their index build process is entirely based on a Hadoop cluster running the Hadoop Distributed File System (HDFS) and makes use of Hadoop’s map/reduce features.

In fact Google also uses a number of well-known components to fulfill their enormous data processing needs: a distributed file system (GFS) ( http://labs.google.com/papers/gfs.html ), Map/Reduce ( http://labs.google.com/papers/mapreduce.html ), and BigTable ( http://labs.google.com/papers/bigtable.html ).

Hbase is actually the open-source equivalent of Google’s Bigtable, which, as far as we understand the matter is a great technological achievement of the guys behind Powerset. Both JimKellerman and Michael Stack are from Powerset and are the initial contributors of Hbase.

Hbase could be the panacea for Powerset in scaling their index up to Google’s level, yet coping Google’s approach is perhaps not the right direction for a small technological company like Powerset. We wonder if Cuill, yet another start-up search engine that’s claiming to have invented a technology for cheaper and faster indexation than Google’s, has built their architecture upon the Hbase/Hadoop environment.  Cuill claims that their indexing costs will be 1/10th of Google’s, based on new search architectures and relevance methods. If it is true what would the Powerset costs then be considering the fact that Powerset is probably having higher indexing costs even compared to Google, because it does a deep contextual analysis on every sentence on every web page? Taking into consideration that Google is having more than 450,000 servers in several major data centers and Powerset’s indexing and storage costs might be even higher the approach Powerset is taking might be costly business for their investors.

Unless Hbase and Hadoop are the secret answer Powerset relies on to significantly reduce the costs. 

Hadoop is an interesting software platform that lets one easily write and run applications that process vast amounts of data.

Here’s what makes Hadoop especially useful:

  • Scalable: Hadoop can reliably store and process petabytes.
  • Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
  • Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
  • Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.
Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.
Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch.

Hbase’s background

Google’s  Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.  Just as Bigtable leverages the distributed data storage provided by the Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop. Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key). Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column given a specific row key.

Reach

According to Quantcast, Powerset is basically not popular site and is reaching less than 20,000 unique visitors per month, around 10,000 Americans. Compete is reporting the same – slightly more than 20,000 uniques per month. Considering the fact the search engine is still in its alpha stage these numbers are not that bad.

The People

Powerset has assembled a star team of talented engineers, researchers, product innovators and entrepreneurs to realize an ambitious vision for the future of search. Our team comprises industry leaders from a diverse set of companies including: Altavista, Apple, Ask.com, BBN, Digital, IDEO, IBM, Microsoft, NASA, PARC, Promptu, SRI, Tellme, Whizbang! Labs, and Yahoo!.

Founders of Powerset are Barney Pell and Lorenzo Thione and the company is actually headquartered in San Francisco. Recently Barney Pell has stepped down from the CEO spot and is now the company’s CTO.

Barney Pell, Ph.D. (CTO) For over 15 years Barney Pell (Ph.D. Computer science, Cambridge University 1993) has pursued groundbreaking technical and commercial innovation in A.I. and Natural Language understanding at research institutions including NASA, SRI, Stanford University and Cambridge University. In startup companies, Dr. Pell was Chief Strategist and VP of Business Development at StockMaster.com (acquired by Red Herring in March, 2000) and later had the same role at Whizbang! Labs. Just prior to Powerset, Pell was an Entrepreneur in Residence at Mayfield, one of the top VC firms in Silicon Valley.

Lorenzo Thione (Product Architect) Mr. Thione brings to Powerset years of research experience in computational linguistics and search from Research Scientist positions at the CommerceNet consortium and the Fuji-Xerox Palo Alto Laboratory. His main research focus has been discourse parsing and document analysis, automatic summarization, question answering and natural language search, and information retrieval. He has co-authored publications in the field of computational linguistics and is a named inventor on 13 worldwide patent applications spanning the fields of computational linguistics, mobile user interfaces, search and information retrieval, speech technology, security and distributed computing. A native of Milan, Italy, Mr. Thione holds a Masters in Software Engineering from the University of Texas at Austin.

Board of Directors

Aside Barney Pell, who is also serving on the company’s board of directors, other board members are:

Charles Moldow (BOD) is a general partner at Foundation Capital. He joined Foundation on the heels of successfully building two companies from early start-up through greater than $100 million in sales. Most notably, Charles led Tellme Networks in raising one of the largest private financing rounds in the country post Internet bubble, adding $125 million in cash to the company balance sheet during tough market conditions in August, 2000. Prior to Tellme, Charles was a member of the founding team of Internet access provider @Home Network. In 1998, Charles assisted in the $7 billion acquisition of Excite Network. After the merger, Charles became General Manager of Matchlogic, the $80 million division focused on interactive advertising.

Peter Thiel (BOD) is a partner at Founders Fund VC Firm in San Francisco. In 1998, Peter co-founded PayPal and served as its Chairman and CEO until the company’s sale to eBay in October 2002 for $1.5 billion. Peter’s experience in finance includes managing a successful hedge fund, trading derivatives at CS Financial Products, and practicing securities law at Sullivan & Cromwell. Peter received his BA in Philosophy and his JD from Stanford.

Investors

In June 2007 Powerset has raised $12.5M in series A round of funding from Foundation Capital and The Founder’s Fund. Early investors include Eric Tilenius and Peter Thiel, who is also early investor in Facebook.com. Other early investors are as follows:

CommerceNet is an entrepreneurial research institute focused on making the world a better place by fulfilling the promise of the Internet. CommerceNet invests in exceptional people with bold ideas, freeing them to pursue visions outside the comfort zone of research labs and venture funds and share in their success.

Dr. Tenenbaum is a world-renowned Internet commerce pioneer and visionary. He was founder and CEO of Enterprise Integration Technologies, the first company to conduct a commercial Internet transaction (1992), secure Web transaction (1993) and Internet auction (1993). In 1994, he founded CommerceNet to accelerate business use of the Internet. In 1997, he co-founded Veo Systems, the company that pioneered the use of XML for automating business-to-business transactions. Dr. Tenenbaum joined Commerce One in January 1999, when it acquired Veo Systems. As Chief Scientist, he was instrumental in shaping the company’s business and technology strategies for the Global Trading Web. Earlier in his career, Dr. Tenenbaum was a prominent AI researcher, and led AI research groups at SRI International and Schlumberger Ltd. Dr. Tenenbaum is a Fellow and former board member of the American Association for Artificial Intelligence, and a former Consulting Professor of Computer Science at Stanford. He currently serves as an officer and director of Webify Solutions and Medstory Inc., and is a Consulting Professor of Information Technology at Carnegie Mellon’s new West Coast campus. Dr. Tenenbaum holds B.S. and M.S. degrees in Electrical Engineering from MIT, and a Ph.D. from Stanford. 

Allan Schiffman was CTO and founder of Terisa Systems, a pioneer in communications security Technology to the Web software industry. Earlier, Mr. Schiffman was Chief Technology Officer at Enterprise Integration Technologies, a pioneer in the development of key security protocols for electronic commerce over the Internet. In these roles, Mr. Schiffman has raised industry awareness of role for security and public key cryptography in ecommerce by giving more than thirty public lectures and tutorials. Mr. Schiffman was also a member of the team that designed the Secure Electronic Transactions (SET) payment card protocol commissioned by MasterCard and Visa. Mr. Schiffman co-designed the first security protocol for the Web, the Secure HyperText Transfer Protocol (S-HTTP). Mr. Schiffman led the development of the first secure Web browser, Secure Mosaic, which was fielded to CommerceNet members for ecommerce trials in 1994. Earlier in his career, Mr. Schiffman led the development of a family of high-performance Smalltalk implementations that gained both academic recognition and commercial success. These systems included several innovations widely adopted by other object-oriented language implementers, such as the “just-in-time compilation” technique universally used by current Java virtual machines. Mr. Schiffman holds an M.S. in Computer Science from Stanford University.

Rob Rodin is the Chairman and CEO of RDN Group; strategic advisors focused on corporate transitions, customer interface, sales and marketing, distribution and supply chain management. Additionally, he serves as Vice Chairman, Executive Director and Chairman of the Investment Committee of CommerceNet which researches and funds open platform, interoperable business services to advance commerce. Prior to these positions, Mr. Rodin served as CEO and President of Marshall Industries, where he engineered the reinvention of the company, turning a conventionally successful $500 million distributor into a web enabled $2 billion global competitor. “Free, Perfect and Now: Connecting to the Three Insatiable Customer Demands”, Mr. Rodin’s bestselling book, chronicles the radical transformation of Marshall Industries. 

The Founders Fund – The Founders Fund, L.P. is a San Francisco-based venture capital fund that focuses primarily on early-stage, high-growth investment opportunities in the technology sector. The Fund’s management team is composed of investors and entrepreneurs with relevant expertise in venture capital, finance, and Internet technology. Members of the management team previously led PayPal, Inc. through several rounds of private financing, a private merger, an initial public offering, a secondary offering, and its eventual sale to eBay, Inc. The Founders Fund possesses the four key attributes that well-position it for success: access to elite research universities, contact to entrepreneurs, operational and financial expertise, and the ability to pick winners. Currently, the Founders Fund is invested in over 20 companies, including Facebook, Ironport, Koders, Engage, and the newly-acquired CipherTrust. 

Amidzad – Amidzad is a seed and early-stage venture capital firm focused on investing in emerging growth companies on the West Coast, with over 50 years of combined entrepreneurial experience in building profitable, global enterprises from the ground up and over 25 years of combined investing experience in successful information technology and life science companies. Over the years, Amidzad has assembled a world-class network of serial entrepreneurs, strategic investors, and industry leaders who actively assist portfolio companies as Entrepreneur Partners and Advisors.Amidzad has invested in companies like Danger, BIX, Songbird, Melodis, Freewebs, Agitar, Affinity Circles, Litescape and Picaboo.

Eric Tilenius brings a two-decade track record that combines venture capital, startup, and industry-leading technology company experience. Eric has made over a dozen investments in early-stage technology, internet, and consumer start-ups around the globe through his investment firm, Tilenius Ventures. Prior to forming Tilenius Ventures, Eric was CEO of Answers Corporation (NASDAQ: ANSW), which runs Answers.com, one of the leading information sites on the internet. He previously was an entrepreneur-in-residence at venture firm Mayfield. Prior to Mayfield, Eric was co-founder, CEO, and Chairman of Netcentives Inc., a leading loyalty, direct, and promotional internet marketing firm. Eric holds an MBA from the Stanford University Graduate School of Business, where he graduated as an Arjay Miller scholar, and an undergraduate degree in economics, summa cum laude, from Princeton University.

Esther Dyson does business as EDventure, the reclaimed name of the company she owned for 20-odd years before selling it to CNET Networks in 2004. Her primary activity is investing in start-ups and guiding many of them as a board member. Her board seats include Boxbe, CVO Group (Hungary), Eventful.com, Evernote, IBS Group (Russia, advisory board), Meetup, Midentity (UK), NewspaperDirect, Voxiva, Yandex (Russia)… and WPP Group (not a start-up). Some of her other past IT investments include Flickr and Del.icio.us (sold to Yahoo!), BrightMail (sold to Symantec), Medstory (sold to Microsoft), Orbitz (sold to Cendant and later re-IPOed). Her current holdings include ActiveWeave, BlogAds, ChoiceStream, Democracy Machine, Dotomi, Linkstorm, Ovusoft, Plazes, Powerset, Resilient, Tacit, Technorati, Visible Path, Vizu.com and Zedo. On the non-profit side, Dyson sits on the boards of the Eurasia Foundation, the National Endowment for Democracy, the Santa Fe Institute and the Sunlight Foundation. She also blogs occasionally for the Huffington Post, as Release 0.9.

Adrian Weller – Adrian graduated in 1991 with first class honours in mathematics from Trinity College, Cambridge, where he met Barney. He moved to NY, ran Goldman Sachs’ US Treasury options trading desk and then joined the fixed income arbitrage trading group at Salomon Brothers. He went on to run US and European interest rate trading at Citadel Investment Group in Chicago and London. Recently, Adrian has been traveling, studying and managing private investments. He resides in Dublin with his wife, Laura and baby daughter, Rachel.

Azeem Azhar – Azeem is currently a technology executive focussed on corporate innovation at a large multinational. He began his career as a technology writer, first at The Guardian and then The Economist . While at The Economist, he launched Economist.com. Since then, he has been involved with several internet and technology businesses including launching BBC Online and founding esouk.com, an incubator. He was Chief Marketing Officer for Albert-Inc, a Swiss AI/natural language processing search company and UK MD of 20six, a blogging service. He has advised several internet start-ups including Mondus, Uvine and Planet Out Partners, where he sat on the board. He has a degree in Philosophy, Politics and Economics from Oxford University. He currently sits on the board of Inuk Networks, which operates a IPTV broadcast platform. Azeem lives in London with his wife and son.

Todd Parker – Since 2002, Mr. Parker has been a Managing Director at Hidden River, LLC, a firm specializing in Mergers and Acquisitions consulting services to the wireless and communications industry. Previously and from 2000 to 2002, Mr. Parker was the founder and CEO of HR One, a human resources solutions provider and software company. Mr. Parker has also held senior executive and general manager positions with AirTouch Corporation where he managed over 15 corporate transactions and joint venture formations with a total value of over $6 billion. Prior to AirTouch, Mr. Parker worked for Arthur D. Littleas a consultant. Mr. Parker earned a BS from Babson College in Entrepreneurial Studies and Communications.

Powerset.com is the 2nd Semantic App being featured by Web2Innovations in its series of planned publications where we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

The purpose of these publications is to discover and showcase today’s Semantic Web Apps and projects. We’re not going to rank them, because there is no way to rank these apps at this time – many are still in alpha and private beta.

Via

[ http://www.powerset.com ]
[ http://www.powerset.com/about ]
[ http://en.wikipedia.org/wiki/Power_set ]
[ http://en.wikipedia.org/wiki/Powerset ]
[ http://blog.powerset.com/ ]
[ http://lucene.apache.org/hadoop/index.html ]
[ http://wiki.apache.org/lucene-hadoop/Hbase ]
[ http://blog.powerset.com/2007/10/16/powerset-empowered-by-hadoop ]
[ http://www.techcrunch.com/2007/09/04/cuill-super-stealth-search-engine-google-has-definitely-noticed/ ]
[ http://www.barneypell.com/ ]
[ http://valleywag.com/tech/rumormonger/hanky+panky-ousts-pell-as-powerset-ceo-318396.php ]
[ http://www.crunchbase.com/company/powerset ]

Is Google trying to become a Social Search Engine

Based on what we are seeing the answer is close to yes. Google is now experimenting with new social features aimed at improving the users’ search experience.

This experiment lets you influence your search experience by adding, moving, and removing search results. When you search for the same keywords again, you’ll continue to see those changes. If you later want to revert your changes, you can undo any modifications you’ve made. Note that Google claims this is an experimental feature and may be available for only a few weeks.

There seems to be features like “Like it”, “Don’t like it?” and “Know of a better web page”. Of course, to get full advantage of these extras as well as to have your recommendations associated with your searches later, upon your return, you have to be signed in.

There is nothing new here, many of the smaller social search engines are deploying and using some of the features Google is just now trying to test, but having more than 500 million unique visitors per month, the vast majority of which are heavily using Google’s search engine, is a huge advantage if one wants to implement social elements in finding the information on web easily. Even Marissa Mayer, Google’s leading executive in search, said in August that Google would be well positioned to compete in social search. Actually with that experiment in particular it appears your vote only applies to what Google search results you will see, so it is hard to call it “social” at this time around. This may prove valuable as a stand-alone service. Also, Daniel Russell of Google, some time ago, made it pretty clear that they use user behavior to affect search results. Effectively, that’s using implicit voting, rather than explicit voting.

We think, however, the only reason Google is trying to deal with these social features, relying on humans to determine the relevancy, is their inability to effectively fight the spam their SERPs are flooded with. 

Manipulating algorithmic based results, in one way or another is in our understanding not much harder than what you would eventually be able to do to manipulate or influence results in Google that rely and depend on social recommendations. Look at Digg for example.

We think employing humans to determine which results are best is basically an effective pathway to corruption, which is sort of worse than to have an algorithm to blame for the spam and low quality of the results. Again take a look at Digg, dmoz.org and mostly Wikipedia. Wikipedia, once a good idea, became a battle field for corporate, brand, political and social wars. Being said that, we think the problem of Google with the spam results lies down to the way how they reach to the information or more concrete the methods they use to crawl and index the vast Web. Oppositely, having people, instead of robots, gathering the quality and important information (from everyone’s point of view) from around the web is in our understanding much better and effective approach rather than having all the spam results loaded on the servers and then let the people sort them out.

That’s not the first time Google is trying new features with their search results. We remember searchmash.com. Searchmash.com is yet another of the Google’s toys in the search arena, which was quietly started out a year ago because Google did not want the public to know about this project and influence their beta testers (read: the common users) with the brand name Google. The project, however, quickly became poplar since many people discovered who the actual owner of the beta project is.

Google is under no doubt getting all the press attention they need, no matter what they do and sometimes even more than what they do actually need from. On the other hand things seem to be slowly changing today and influential media like New York Times, Newsweek, CNN and many others are in a quest for the next search engine, the next Google. This was simply impossible to happen during 2001, 2002 up to 2004, period characterized with a solid media comfort for Google’s search engine business.  

So, is Google the first one to experiment with social search approaches, features, methods and extras? No, definitely not as you are going to see for yourself from the companies and projects listed below.

As for crediting a Digg-like system with the idea of sorting content out based on community voting, they definitely weren’t the first. The earliest implementation of this we are aware of is Kuro5hin.org (http://en.wikipedia.org/wiki/Kuro5hin), which, we think, was founded back in 1999.

Eurekster

One of the first and oldest companies coined social search engines on Web is Eureskter. 
Eurekster launched its community-powered social search platform “swicki”, as far as we know, in 2004, and explicit voting functionality in 2006. To date, over 100,000 swickis have been built, each serving a community of users passionate about a specific topic. Eurekster processes over 25,000,000 searches a month. The key to Eurekster’s success in improving relevancy here has been leveraging the explicit (and implicit) user behavior though at the group or community level, not individual or general. On the other hand Eurekster never made it to the mainstream users and somehow the company slowly faded away, lost the momentum.

Wikia Social Search

Wikia was founded by Jimmy Wales (Wikipedia’s founder) and Angela Beesley in 2004. The company is incorporated in Delaware. Gil Penchina became Wikia’s CEO in June 2006, at the same time the company moved its headquarters from St. Petersburg, Florida, to Menlo Park and later to San Mateo in California. Wikia has offices in San Mateo and New York in the US, and in PoznaÅ„ in Poland. Remote staff is also located in Chile, England, Germany, Japan, Taiwan, and also in other locations in Poland and the US. Wikia has received two rounds of investment; in March 2006 from Bessemer Venture Partners and in December 2006 from Amazon.com.

According to the Wikia Search the future of Internet Search must be based on:

  1. Transparency – Openness in how the systems and algorithms operate, both in the form of open source licenses and open content + APIs.
  2. Community – Everyone is able to contribute in some way (as individuals or entire organizations), strong social and community focus.
  3. Quality – Significantly improve the relevancy and accuracy of search results and the searching experience.
  4. Privacy – Must be protected, do not store or transmit any identifying data.

Other active areas of focus include:

  1. Social Lab – sources for URL social reputation, experiments in wiki-style social ranking.
  2. Distributed Lab – projects focused on distributed computing, crawling, and indexing. Grub!
  3. Semantic Lab – Natural Language Processing, Text Categorization.
  4. Standards Lab – formats and protocols to build interoperable search technologies.

Based on who Jimmy Wales is and the success he achieved with Wikipedia therefore the resources he might have access to, Wikia Search stands at good chances to survive against any serious competition by Google.

NosyJoe.com

NosyJoe is yet another great example of social search engine that employs intelligent tagging technologies and runs on a semantic platform.

NosyJoe is a social search engine that relies on you to sniff for and submit the web’s interesting content and offers basically meaningful search results in the form of readable complete sentences and smart tags. NosyJoe is built upon the fundamental belief people are better than robots in finding the interesting, important and quality content around Web. Rather than crawling the entire Web building a massive index of information, which aside being an enormous technological task, requires huge amount of resources and is time consuming process would also load lots of unnecessary information people don’t want, NosyJoe is focused just on those parts of the Web people think are important and find interesting enough to submit and share with others.

NosyJoe is a hybrid of a social search engine that relies on you to sniff for and submit the web’s interesting content, an intelligent content tagging engine on the back end and a basic semantic platform on its web visible part. NosyJoe then applies a semantic based textual analysis and intelligently extracts the meaningful structures like sentences, phrases, words and names from the content in order to make it just one idea more meaningfully searchable. This helps us present the search results in basically meaningful formats like readable complete sentences and smart phrasal, word and name tags.

The information is then clustered and published across the NosyJoe’s platform into contextual channels, time and source categories and semantic phrasal, name and word tags are also applied to meaningfully connect them together, which makes even the smallest content component web visible, indexable and findable. At the end a set of algorithms and user patterns are applied to further rank, organize and share the information.

From our quick tests on the site the search results returned were presented in form of meaningful sentences and semantic phrasal tags (as an option), which turns their search results into — something we have never seen on web so far — neat content components, readable and easily understandable sentences, unlike what we are all used to, some excerpts from the content where the keyword is found in. When compared to other search engines’ results NosyJoe.com’s SERPs appear truly meaningful.

As of today, and just 6 or 7 months since they went online, NosyJoe is already having more than 500,000 semantic tags created that connect tens of thousands of meaningful sentences across their platform.

We have no information as to who stays behind NosyJoe but the project seems very serious and promising in many aspects from how they gather the information to how they present the results to the way they offset low quality results. From all newcomers social search engines NosyJoe stands at best changes to make it. As far as we know NosyJoe is also based in the Silicon Valley. 

Sproose

Sproose says it is developing search technology that lets users obtain personalized results, which can be shared among a social network, using the Nutch open-source search engine, and building applications on top. Their search appears to using third party search feeds and ranks the results based on the users’ votes.

Sproose is said it has raised nearly $1 million in seed funding. It is based in Danville, a town on the east side of the SF Bay Area. Sproose said Roger Smith, founder, former president and chief executive at Silicon Valley Bank, was one of the angel investors, and is joining Sproose’s board.

Other start-up search engines of great variety are listed below:

  • Hakia – Relies on natural language processing. These guys are also experimenting with social elements with the feature so called “meet others who asked the same query“.
  • Quintura – A visual engine based today in Virginia, US. The company is founded by Russians and has early been headquartered in Moscow. 
  • Mahalo – search engine that looks more like a directory with quality content handpicked by editors. Jason Calacanis is the founder of the company.
  • ChaCha – Real humans try to help you in your quest for information, via chat. The company is based in Indiana and has been criticized a lot by the Silicon Valley’s IT community. Despite these critics they have recently raised $10m in Series A round of funding. 
  • Powerset – Still in closed beta and also relying on understanding the natural language. See our Powerset review.  
  • Clusty – founded in 2000 by three Carnegie Mellon University scientists.
  • Lexxe – Sydney based engine featuring natural language processing technologies.
  • Accoona – The company has recently filed for an IPO in US planning to raise $80M from the public.
  • Squidoo – It has been started in October 2005 by Seth Godin and looks more like a wiki site, ala Wikia or Wikipedia where anyone creates articles on different topics.
  • Spock – Focuses on people information, people search engine.

One thing is for sure today; Google is now bringing solid credentials to and is somehow legitimating the social search approach, which by the way is helping those so many smaller so-called social search engines. 

Perhaps it is about time for consolidation in the social search sector. Some of the smaller but more promising social search engines can now become one in order to be able to compete with and prevent Google’s dominance within the social search sector too, just like what they did with the algorithmic search engines. Is Google also interested in? Anyone heard of recent interest in or already closed acquisition deals for start-up social search engines?

On the contrary, more and more IT experts, evangelists and web professionals agree on the fact that taking Google down is a challenge that will most likely be accomplished by a concept that is anything else but not a search engine in our traditional understanding. Such concepts, including but not limited to, are Wikipedia, Del.icio.us and LinkedWords. In other words finding information on web doesn’t necessarily mean to search for it.

Via:
[ http://www.google.com/experimental/a840e102.html ]
[ http://www.blueverse.com/2007/12/01/google-the-social-…]
[ http://www.adesblog.com/2007/11/30/google-experimenting-social… ]
[ http://www.techcrunch.com/2007/11/28/straight-out-of-left-field-google-experimenting-with-digg-style-voting-on-search-results ]
[ http://www.blogforward.com/money/2007/11/29/google… ]
[ http://nextnetnews.blogspot.com/2007/09/is-nosyjoecom-… ]
[ http://www.newsweek.com/id/62254/page/1 ]
[ http://altsearchengines.com/2007/10/05/the-top-10-stealth-… ]
[ http://www.nytimes.com/2007/06/24/business/yourmoney/…  ]
[ http://dondodge.typepad.com/the_next_big_thing/2007/05… ]
[ http://search.wikia.com/wiki/Search_Wikia ]
[ http://nosyjoe.com/about.com ]
[ http://www.siliconbeat.com/entries/2005/11/08/sproose_up_your… ]
[ http://nextnetnews.blogspot.com/2007/10/quest-for-3rd-generation… ]
[ http://www.sproose.com ]