Category Archives: Semantic Web

Acquisitions, Advertising, angel investors, Business, Companies, Database, eCommerce, Education, Entertainment, Funding, Games, Internet, Investments, Marketing, Media, Mobile, Money, Multimedia, Music, Search Engines, Semantic Apps, Semantic Web, Shopping, Social Networking, Society, Software, Technology, Telecom, Travel, video, Virtual Worlds, Voice, Web 2.0

2008’s Most Popular Web 2.0 Sites

February 26, 2009 Web 2.0 Innovations Leave a comment

Today we are living in web 2.0 times more than ever before. PR, press coverage, buzz, evangelism, lobbying, who knows who, who blogs who, who talks about who, mainstream media and beyond – all of those words found in the dictionary of almost every new web site that coins itself as web 2.0, but as the global economy crisis is raising upon us promising to leave us working in a very depressed business environment with little to no liquidation events at all for the next years the real question is: who the real winners in today’s web 2.0 space are based on real people using their web properties since 2005 the web 2.0 term was coined for first time. Since then we have witnessed hundreds of millions of US dollars poured into different web 2.0 sites, applications and technologies and perhaps now is the time to find out which of those web sites have worked things out. We took the time necessary to discover today’s most popular web 2.0 sites based on real traffic and site usage and Not on buzz or size of funding. Sites are ranked based on the estimated traffic figures. After spending years in assessing web 2.0 sites applying tens of different from economical and technological to media criteria in an effort to evaluate them we came up to the conclusion that there is only one criterion worth our attention and it is the real people that use a given site, the traffic, the site usage, etc., based on which the web site can successfully be monetized. Of course, there are a few exceptions from the general rule like sites with extremely valuable technologies and no traffic at all, but as we said, they are exceptions. Ad networks, web networks, hosted networks and group of sites that use consolidated traffic numbers as their own or such ones that rely on the traffic of other sites to boost their own figures (ex.: various ad networks, Quantcast, WordPress etc.) are not taken into consideration and the sites from within those respective networks and groups have been ranked separately. International traffic is of course taken into consideration. Add ons, social network apps and widgets usage is not taken into consideration. Sub-domains as well as international TLDs part of the principal business of the main domain/web site are included. Media sites including such covering the web 2.0 space have also been included. Old buys from the dot com era are not considered and ranked accordingly.

Disclaimer: some data based on which the sites below are ranked may not be complete or correct due to lack of public data available for the traffic of respective sites. Please also note that the data taken into consideration for the ranking may have meanwhile changed and might possibly be no longer the same at the time you are reading the list. Data has been gathered during the months of July, August, September and December 2008.

Today’s most popular Web 2.0 sites based on the traffic they get as measured during the months of July, August and September 2008.

Priority is given to direct traffic measurement methods wherever applicable. Panel data as well as toolbar traffic figures are not taken into cosndieration. Traffic details as taken from Quantcast, Google Analytics*, Nielsen Site Audit, Nielsen NetRatings, comScore Media Metrix, internal server log files*, Compete and Alexa. Press release, public relation and buzz traffic and usage figures as they have appeared in the mainstream and specialized media are given with lower priority unless supported by direct traffic measurement methods.

*wherever applicable

Web Property / Unique visitors per month

WordPress.com ~ 100M
YouTube.com ~ 73M
MySpace.com ~ 72M
Wikipedia.org ~ 69M
Hi5.com ~ 54M
Facebook.com ~ 43M
BlogSpot.com ~ 43M
PhotoBucket.com ~ 34M
MetaCafe.com ~ 30M
Blogger.com ~ 27M
Flickr.com ~ 23M
Scribd.com ~ 23M
Digg.com ~ 21M
Typepad.com ~ 17M
Imeem.com ~ 17M
Snap.com ~ 15.7M
Fotolog.com ~ 15.6M
RockYou.com ~ 15M
Veoh.com ~ 12M
Wikihow.com ~ 12M
Topix.com ~ 11.5M
Blinkx.com ~ 11M
HuffingtonPost.com ~ 11M
Technorati.com ~ 10.6M
Wikia.com ~ 10.8M
Zimbio.com ~ 10.3M
SpyFu.com ~ 10.1M
Heavy.com ~ 9.3M
Yelp.com ~ 8.9M
Slide.com ~ 8.5M
SimplyHired.com ~ 8.5M
Squidoo.com ~ 8.1M
LinkedIn.com ~ 7.5M
HubPages.com ~ 7.2M
Hulu.com ~ 7.1M
AssociatedContent.com ~ 7M
Indeed.com ~ 5.4M
LiveJournal.com ~ 5.2M
Bebo.com ~ 5.1M
Habbo.com ~ 4.9M
Fixya.com ~ 4.5M
RapidShare.com ~ 4.5M
AnswerBag.com ~ 4.4M
Metafilter.com ~ 4.3M
Crackle (Grouper) ~ 4M
Ning.com ~ 3.8M
Breitbart.com ~ 3.8M
BookingBuddy.com ~ 3.7M
Kayak.com ~ 3.6M
Blurtit.com ~ 3.2M
Kaboodle.com ~ 3M
Meebo.com ~ 2.9M
Friendster.com ~ 2.7M
WowWiki.com ~ 2.8M
Truveo.com ~ 2.7M
Trulia.com ~ 2.7M
Twitter.com ~ 2.5M
BoingBoing.net ~ 2.4M
Techcrunch.com ~ 2.2M
Zillow.com ~ 2.2M
MyNewPlace.com ~ 2.2M
Mahalo.com ~ 2.1M
Vox.com ~ 2M
Last.fm ~ 2M
Glam.com ~ 1.9M
Multiply.com ~ 1.9M
Popsugar.com ~ 1.6M
Addthis.com ~ 1.5M
Pandora.com ~ 1.4M
Brightcove.com ~ 1.4M
LinkedWords.com ~ 1.3M
Devshed.com ~ 1.3M
AppleInsider.com ~ 1.3M
Newsvine.com ~ 1.3M
Fark.com ~ 1.2M
BleacherReport.com ~ 1.2M
Mashable.com ~ 1.2M
Zwinky.com ~ 1.2M
Quantcast.com ~ 1.2M
StumbleUpon.com ~ 1.1M
SecondLife.com ~ 1.1M
Magnify.net ~ 1.1M
Uncyclopedia.org ~ 1M
Weblo.com ~ 1M
Del.icio.us ~ 1M
Reddit.com < 1M
Pbwiki.com < 1M
AggregateKnowledge.com < 1M
Eventful.com < 1M
Dizzler.com < 1M
Synthasite.com < 1M
Vimeo.com < 1M
Zibb.comÂ <1M

Web 2.0 sites having less than 1M unique visitors per month even though popular in one way or another are not subject of this list and are not taken into consideration. We know for at least 100 other considered really good web 2.0 sites, apps and technologies of today, but since they are getting less than 1M uniuqes per month they were not able to make our list. However, sites being almost there (850K-950K/mo) and believed to be in position to reach the 1M monthly mark in the next months are also included at the bottom of the list. Those sites are marked with “<“, which means close to 1M, but not yet there. No hard feelings :).

If we’ve omitted one site or another that you know is getting at least 1M uniques per month and you are not seeing it above, drop us a note at info[at]web2innovations.com and we’ll have it included. Please note that the site proposed should be having steady traffic for at least 3 months prior submission to the list above. Sites like, for example: Powerset and Cuil, may not qualify for inclusion due to their temporary traffic leaps caused by buzz they have gotten, a criterion we try to offset. For other corrections and omissions please write at same email address. Requests for corrections of the traffic figures the sites are ranked on can only be justified by providing us with the accurate traffic numbers from reliable direct measurement sources (Quantified at Quantcast, Google Analytics, Nielsen Site Audit, Nielsen NetRatings, comScore Media Metrix, internal server log files, other third party traffic measurement services that use the direct method. No panel data, no Alexa, no Compete etc. will be taken into consideration).

* Note that ranks given to sites at w2i reflect only our own vision for and understanding of the site usage, traffic and unique visitors of the sites being ranked and does not necessarily involve other industry experts’, professionals’, journalists’ and bloggers’ opinions. You acknowledge that any ranking available on web2innovations.com (The Site) is for informational purposes only and should not be construed as investment advice or a recommendation that you, or anyone you advise, should buy, acquire or invest in any of the companies being analyzed and ranked on the Site, or undertake any investment strategy, based on rankings seen on the Site. Moreover, if a company is described or mentioned in our Site, you acknowledge that such description or mention does not constitute a recommendation by web2innovations.com that you engage or otherwise use such web site.

The full list

Advertising, Internet, Marketing, Media, Search Engines, Semantic Apps, Semantic Web, Software, Technology, Web 2.0

LinkedWords.com – the consolidated traffic for the entire 2008 is expected to be in the 10 Million range

October 13, 2008 Web 2.0 Innovations Leave a comment

Launched back in the middle of 2006 LinkedWords has essentially proven over the past years to be very effective vehicle in helping web sites get contextually linked on a content area level so that Internet users and smart robots discover their information in context. Since then the contextual platform has rapidly grown from 30,000 uniques per month back in its early days during 2006 to over 1 million unique visitors per month these past months of 2008.

The successful formula seems to be simple yet very effective: the higher the number of small to mid level sitesâ€™ content areas contextually linked in LWâ€™s platform – the higher the number of contextually targeted unique visitors shared among those web sites linked in.

Both Google Analytics and Quantcast measured traffic are now reporting for over 1,000,000 unique visitors per month.

Some interesting facts in regard to the siteâ€™s traffic and usage to note are:

1) The 400,000 unique visitors’ mark per month was surpassed for first time in April 2007;

2) For the entire 2007 LW ended up with more than 4,500,000 unique visitors to its contextual platform;

3) For the period of 12 months between Apr 2007 and Apr 2008 LW ended up with more than 7,700,000 unique visitors to its contextual platform;

4) The highest number of monthly visitors so far was encountered during the month of April 2008 when the platform had more than 1,300,000 uniques;

5) 47,564 is the highest number of daily unique visitors ever happened so far, which occurred on April 07, 2008;

This year (2008) was however not all glorious for LinkedWords. During the month of April ’08 their platform has experienced an unprecedented growth in the traffic reaching over 1.3M unique visitors, which resulted in a failure on one of the servers in their cluster causing major downtime. The impacted period was from Friday, April 25 to Friday, May 16, 2008. Millions of unique visitors to LW were said to have been affected. It took them more than 4 months to completely recover both their platform and their reach.

Despite the major downtime that took place during the entire month of May â€™08 and had later affected the traffic for a period of several months starting from May and ending on August â€™08 considerably slowing it down the anticipated consolidated traffic for the entire 2008 is expected to be in the 10M range, which is double increase from 2007.

About LinkedWords

LinkedWords (LW) is an innovative contextual platform built upon millions of English words and phrases organized into contextual categories, paths, and semantic URLs whose mission is to maximize contextual linking among web sites across the Web.

Via EPR Network

Via LinkedWords’ Blog

Business, Google, Internet, Multimedia, Semantic Web, Software, Technology, Virtual Worlds, Web 2.0

A new way to build your Google Maps

March 19, 2008 Web 2.0 Innovations Leave a comment

These days we came across a tiny French based company called Click2Map that is providing an interesting editor for creating mash-ups with Google Maps. You can there create fully customizable interactive professional online maps from existing data and the editor also offers database and template functionalities. They have just added a powerful template system coupled with a highly versatile database engine that allows professional users to store data and use it wherever they need to in fully customizable templates.

The Metz, France based Click2Map is a powerful online mapping application published by the Click2Map company. Click2Map puts all the power of Web 2.0 at the service of its users: its familiar point-and-click interface makes creating and sharing interactive online maps a snap. Everyone can now create rich and customized online maps without writing a single line of code!

Click2Mapâ€™s editor allows users to create markers and POIs using a familiar application environment and provides convenient access to existing markers. Advanced users appreciate the possibility to create an unlimited number of maps including unlimited numbers of markers and optional groups.

Importing groups and markers now takes another dimension with the possibility to use variables extensively: all the information stored in your personal database can now be inserted wherever needed in each and every marker thanks to Click2Mapâ€™s dynamic variable engine! No matter how many personal data categories and fields, Click2Map automatically generates the corresponding variables that you can instantly use: creating large quantities of personalized markers has never been easier!

Click2Mapâ€™s enhanced import/export system provides an efficient means to integrate existing data into online maps and to exchange information with third party applications. The recent addition of an exclusive statistic engine helps professionals track their mapsâ€™ consultations and the way theyâ€™re used by visitors: the popularity of each map and marker can now be tracked in real time.

By allowing companies to create fully customized online Google Maps based on their existing data, Click2Map provides them with unprecedented means of promoting their business online.

Click2Map SARL is a leading French provider of GeoWeb Solutions. Click2Map is its flagship product, an easy to use online application to create, manage and publish online professional maps without any knowledge of programming. Click2Map SARL also provides full technical support and customization of its Click2Map Editor and Maps Generator.

Story was picked up from EPR Network.

http://www.click2map.com/
http://blog.click2map.com/
http://wiki.click2map.com/
http://express-press-release.com/47/Click2Map%20Adds%20Template%20and%20Database%20Features%20to%20Google%20Maps.php

Acquisitions, Automattic, Business, Internet, Investments, Media, Money, Privacy, Semantic Apps, Semantic Web, Social Networking, Society, Software, Technology, Virtual Worlds, Web 2.0

What is the real reason Automattic bought Glavatar?

March 16, 2008 Web 2.0 Innovations Leave a comment

As some of you already know w2i (web2innovations.com) is keeping an internal archive of almost all funding and acquisition deals that happened over the past years on web. While we have the ambitions to report on all of them the deals are so many so that we end up only writing about some of the most interesting ones. The same is the case with Automattic when they bought Glavatar some months ago. We kept the news in our archive for quite long time trying to figure out ourselves what is the real motive behind the acquisition of Glavatar and since we came up to no particular synergy and reason we have decided today to simply write about.

First off Automattic is the company behind the popular blog software WordPress. The site is amongst the most popular on web with more than 90M uniques per month. When Matt Mullenweg, announced the deal on the Glavatarâ€™s blog he wrote about so many improvements that Glavatar is going to face with its new owner. Such as scaling things up, they transferred the Rails application and most of the avatar serving to WordPress.comâ€™s infrastructure and servers. Avatar serving was said is already more than three times as fast, and works every time. Theyâ€™ve also moved Glavatarâ€™s blog from Mephisto to WordPress, of course.

He further said â€œBasically, we did the bare minimum required to stabilize and accelerate the Gravatar service, focusing a lot on making the gravatars highly available and fast. However our plans are much bigger than that.â€ Among those are all of the Premium features have gone free, and refunding was offered to anyone who bought them in the last 60 days; gravatar serving moved to a Content Delivery Network (CDN) so not only will they be fast, itâ€™ll be low latency and not slow down a page load; Merging the million avatars WordPress had with the 115,000 or so Glavatar brought on the table after the dealÂ and make them available through the Gravatar API; integrate and improve templates and bring features like multiple avatars over; from WordPress.com, bring the bigger sizes (128px) over and make that available for any Gravatar (Gravatars are only available up to 80px); Adding Microformat support for things like XFN rel=”me” and hCard to all avatar profile pages (that is in particular an interesting move); develop a new API that has cleaner URLs and allows Gravatars to be addressed by things like URL in addition to (or instead of) email addresses and not last rewrite the entire application itself to fit directly into WordPress.comâ€™s grid, for internet-scale performance and reliability.

These days after Yahoo announcing big plans of moving towards web semantics and adopting some of the microformats and hinting LinkedIn for possible better relations with their data set if they adopt them too is a clear signal that web is slowly moving towards semantically linking of data. Automattic is obviously looking forward to that time too with its plans to add microformats like XFN (XHTML Friends Network) and hCard (simple, open, distributed format for representing people, companies, organizations, and places, using a 1:1 representation of vCard (RFC2426) properties and values in semantic HTML or XHTML). An interesting example of contextually and semantically linked web data is LinkedWords and, as you can see, the way we use them to semantically and contextually link words across our texts and connect them to their contextual platform.

So far so good, but nothing from the above indicates what is the reason Automattic bought the site called Glavatar. It is definitely neither because of their user base (only 115K)Â nor because of the technology, obviously. Employment through acquisition? Not really, Tom Werner, the founder of Glavatar is being said to be a big Ruby guy and taking into consideration the fact Matt seems to be moving towards PHP with GlavatarÂ it seems highly unlikely for Tom to stay with Automattic.

From everything being said publicly it turns out that Automattic has decided to help the small site work better, but no clear benefits are seen for their company from this deal, or at least not to us.

We do believe Matt where he says â€œour plans are much bigger than thatâ€, but what those plans are? Building a social network upon the avatars and the profile data associated with or perhaps having an online identity service built upon. Or, perhaps, simply building a global avatar service (with in-depth profiles) makes more sense for a company that commands over 100M uniques per month rather than for a tiny web site like Glavatar.

Whatever the case is congratulations to the involved. Terms of the deal were not publicly disclosed.

More about Glavatar

The web is no longer about anonymous content generated by faceless corporations. It is about real people and the real content that they provide.

It is about you.

But as powerful as the web has become, it still lacks the personal touch that comes from a handshake. The vast majority of content you come across on the web will still be near-anonymous even though it may have a name attached. Without knowing the author behind the words, the words cannot be trusted. This is where Gravatar comes in.

Gravatar aims to put a face behind the name. This is the beginning of trust. In the future, Gravatar will be a way to establish trust between producers and consumers on the internet. It will be the next best thing to meeting in person.

Today, an avatar. Tomorrow, Your Identity–Online.

http://gravatar.com/
http://site.gravatar.com/site/about
http://automattic.com/
http://blog.gravatar.com/2007/10/18/automattic-gravatar/
http://www.readwriteweb.com/archives/automattic_acquires_gravatar.php
http://www.quantcast.com/p-18-mFEk4J448M
http://microformats.org/wiki/Main_Page
http://rubyisawesome.com/

Business, Games, Internet, Media, Search Engines, Semantic Apps, Semantic Web, Social Networking, Society, Software, Technology, Telecom, Web 2.0

ETech, the O’Reilly Emerging Technology Conference is coming

February 28, 2008 Web 2.0 Innovations Leave a comment

One of the most important technology conferences for the year will be held in March 3-6 in San Diego, California. ETech the O’Reilly Emerging Technology Conference, now in its seventh year, will take a wide-eyed look at the brand new tech that’s tweaking how we are seen as individuals, how we choose to channel and divert our energy and attention, and what influences our perspective on the world around us. How does technology help you perceive things that you never noticed before? How does it help you be found, or draw attention to issues, objects, ideas, and projects that are important, no matter their size or location?

Below is what the 2008 version of ETech, the O’Reilly Emerging Technology Conference will look at.Â

Body Hacking. Genomics Hacking. Brain Hacking. Sex Hacking. Food Hacking. iPhone Hacking.
If you can’t open it, you don’t own it. Take over the everyday aspects of your life and take your senses to the next level.

DIY Aerial Drones. DIY Talking Things. DIY Spectrum. DIY Apocalypse Survival.
As technology becomes more accessible you’ll get to do it all on your own. Self-empowerment starts here.

Emerging Tech of India, Cuba, and Africa. International Political Dissidents.
Different environments incubate new ideas and technologies. What these societies bring out will shake up your cultural assumptions and provide a wider world view.

Visualize Data and Crowds. Ambient Data Streaming.
Dynamic systems require new methods of data capture and interaction. Open a window on the methods experts use to interpret and harness collective intelligence.

Good Policy. Energy Policy. Defense Policy. Genetic Policy. Corruption.
Policy inevitably lags behind technology advances. Learn about some areas where it’s catching up, where it’s not, and how these boundaries shape our creativity and freedom.

Alternate Reality Games. Emotions of Games. Sensor Games.
Games provide a platform for experimentation on so many levels. The ones we’ll see engage their players in new and unexpected ways.

ETech 2008 will cover all of these topics and more. We put on stage the speakers and the ideas that help our attendees prepare for and create the future, whatever it might be. Great speakers are going to pull us forward with them to see what technology can do… and sometimes shouldn’t do. From robotics and gaming to defense and geolocation, we’ll explore promising technologies that are just that–still promises–and renew our sense of wonder at the way technology is influencing and altering our everyday lives.

“There’s more good stuff here, more new directions, than we’ve had at ETech in years, which is only to be expected, as the market starts to digest the innovations of Web 2.0 and we are now featuring the next wave of hacker-led surprises.” Read more of Tim O’Reilly’s thoughts on why ETech is our most important conference.

Registered Speakers

Below are listed all confirmed speakers to date.

Dan Albritton (MegaPhone)
Chris Anderson (Wired Magazine)
W. James Au (The Making of Second Life)
Trevor Baca (Jaduka)
Tucker Balch (Georgia Tech)
Kevin Bankston (Electronic Frontier Foundation)
Andrew Bell (Barbarian Group LLC)
Emily Berger (Electronic Frontier Foundation)
Violet Blue (Violet Blue)
Ed Boyden (MIT Media Lab & Dept. of Biological Engineering)
Gary Bradski (Stanford and Willow Garage)
Tom Carden (Stamen Design)
Liam Casey (PCH International)
Elizabeth Churchill (Yahoo! Research)
Cindy Cohn (Electronic Frontier Foundation)
Steve Cousins (Willow Garage)
Bo Cowgill (Google Economics Group)
Mike Culver (Amazon)
Jason Davis (Disney Online)
Regine Debatty (We Make Money Not Art)
Danielle Deibler (Adobe Systems)
Michael Dory (NYU Interactive Telecommunications Program (ITP))
Nathan Eagle (MIT)
Alvaro Fernandez (SharpBrains.com)
Timothy Ferriss (The 4-hour Workweek)
Eric Freeman (Disney Online)
Limor Fried (Adafruit Industries)
Johannes Grenzfurthner (monochrom, and University of Applied Sciences Graz)
Saul Griffith (Makani Power/Squid Labs)
Karl Haberl (Sun Microsystems, Inc.)
Jury Hahn (MegaPhone)
Justin Hall (GameLayers)
Jeff Han (Perceptive Pixel, Inc.)
Timo Hannay (Nature Publishing Group)
Marc Hedlund (Wesabe)
J. C. Herz (Batchtags LLC)
Todd Holloway (Ingenuity Systems)
Pablos Holman (Komposite)
Tom Igoe (Interactive Telecommunications Program, NYU)
Alex Iskold (AdaptiveBlue)
Brian Jepson (O’Reilly Media, Inc.)
Natalie Jeremijenko (NYU)
Jeff Jonas (IBM)
Tim Jones (Electronic Frontier Foundation)
Terry Jones (Fluidinfo)
Damien Katz (IBM – CouchDB)
Nicole Lazzaro (XEODesign, Inc.)
Elan Lee (Fourth Wall Studios)
Jan Lehnardt (Freisatz)
Lawrence Lessig (Creative Commons)
Kati London (area/code)
Kyle Machulis (Nonpolynomial Labs)
Daniel Marcus (Washington University School of Medicine)
Mikel Maron (Mapufacture)
John McCarthy (Stanford University)
Ryan McManus (Barbarian Group LLC)
Roger Meike (Sun Microsystems, Inc.)
Chris Melissinos (Sun Microsystems, Inc.)
Dan Morrill (Google)
Pauline Ng (J. Craig Venter Institute)
Quinn Norton
Peter Norvig (Google, Inc.)
Nicolas Nova (Media and Design Lab)
Danny O’Brien (Electronic Frontier Foundation)
Tim O’Reilly (O’Reilly Media, Inc.)
David Pescovitz (BoingBoing.net, Institute for the Future, MAKE:)
Bre Pettis (I Make Things)
Arshan Poursohi (Sun Microsystems, Inc.)
Marc Powell (Food Hacking)
Jay Ridgeway (Nextumi)
Hugh Rienhoff (MyDaughtersDNA.org)
Jesse Robbins (O’Reilly Radar)
Eric Rodenbeck (Stamen Design)
David Rose (Ambient Devices)
Dan Saffer (Adaptive Path)
Joel Selanikio (DataDyne.org)
Peter Semmelhack (Bug Labs)
Noah Shachtman (Wired Magazine)
Michael Shiloh (OpenMoko)
Kathy Sierra (Creating Passionate Users)
Micah Sifry (Personal Democracy Forum)
Adam Simon (NYU Interactive Telecommunications Program (ITP))
Michael J. Staggs (FireEye, Inc.)
Gavin Starks (d::gen network )
Alex Steffen (Worldchanging)
John Storm (ind)
Stewart Tansley (Microsoft Research)
Paul Torrens (Arizona State University)
Phillip Torrone (Maker Media)
Kentaro Toyama (Microsoft Research India)
Gina Trapani (Lifehacker)
Nate True (Nate True)
Lew Tucker (Radar Networks)
Andrea Vaccari (Senseable City Lab, MIT)
Scott Varland (NYU Interactive Telecommunications Program (ITP))
Merci Victoria Grace (GameLayers)
Mike Walsh (Tomorrow)
Stan Williams (Hewlett-Packard Labs)
Ethan Zuckerman (Global Voices)

Attendee Registration

You can register as an attendee online or by Mail/Fax at the following address:

O’Reilly Media, Inc.
Attn: ETech Registration
1005 Gravenstein Hwy North
Sebastopol, CA 95472
Fax: (707) 829-1342

The conference fees are as follows (through Jan 29 – Mar 2 )
Sessions plus Tutorials $1,690.00
Sessions Only $1,390.00
Tutorials Day Only $595.00

Walk-ins: Standard registration closes March 2, 2008. The onsite registration fee is an additional $100 to the Standard Price above.Â

More about ETech

Now in its seventh year, the O’Reilly Emerging Technology Conference hones in on the ideas, projects, and technologies that the alpha geeks are thinking about, hacking on, and inventing right now, creating a space for all participants to connect and be inspired. ETechs past have covered peer-to-peer networks to person-to-person mobile messaging, web services to weblogs, big-screen digital media to small-screen mobile gaming, hardware hacking to content remixing. We’ve hacked, blogged, ripped, remixed, tracked back, and tagged to the nth. Expect much of what you see in early form here to show up in the products and services you’re taking for granted in the not-too-distant future.

ETech balances blue-sky theorizing with practical, real-world information and conversation. Tutorials and breakout sessions will help you inject inspiration into your own projects, while keynotes and hallway conversation will spark enough unconventional thinking to change how you see your world.

More then 1200 technology enthusiasts are expected to attend ETech 2008, including:

Technologists
CxOs and IT managers
Hackers and grassroots developers
Researchers and academics
Thought leaders
Business managers and strategists
Artists and fringe technologists
Entrepreneurs
Business developers and venture capitalists

Representatives from companies and organizations tracking emerging technologies
In the past, ETech has brought together people from such diverse companies, organizations, and projects as: 37signals, Adaptive Path, Amazon.com, Attensa, August Capital, BBC, Boeing, CBS.com, Comcast, Department of Defense, Disney, E*Trade, Fairfax County Library, Fidelity Investments, Fotango, France Telecom, General Motors, Honda, IEEE, Intel, Macromedia, Meetup, Microsoft, Morgan Stanley, Mozilla, National Security Agency, New Statesman, Nielsen Media Research, Nokia, NYU, Oracle, Orbitz, Platial, Salesforce.com, Sony, Starwood Hotels, Symantec, The Motley Fool, UC Santa Barbara Kavli Institute, Zend, and many more.

Some of ETech’s past sponsors and exhibitors include: Adobe, Aggregate Knowledge, Apple, AT&T, Attensa, eBay, Foldera, Google, IBM, Intuit, iNetWord, Laszlo, MapQuest, mFoundry, Root, RSSBus, Salesforce.com, Sxip, TechSmith, Tibco, Windows Live, Yahoo!, and Zimbra.

The conference is expected to gather some of the brightest minds of todayâ€™s technology world and Web in particular.Â
More

http://conferences.oreilly.com/etech/
http://en.oreilly.com/et2008/public/content/home
http://radar.oreilly.com/archives/2008/01/why-etech-is-oreillys-most-imp.html
Â

Business, Facebook, Founders Fund, Founders Fund II, Funding, Internet, Investments, Money, Peter Thiel, Semantic Web, Social Networking, Software, Technology, The Founder's Fund, Venture Capital, Web 2.0

The Founders Fund creates Founders Fund II

January 30, 2008 Web 2.0 Innovations Leave a comment

Founders Fund, a non-traditional investment group, has raised an institutional fund in the amount of $220 million. The new fund, Founders Fund II, will allow this team of four managing partners, who themselves are founders and entrepreneurs, to leverage their individual expertise and deliver their unique business model, which puts the entrepreneurs first. Founders Fund has developed a comprehensive package designed to create near perfect alignment of interests between founders and their investors.

Founders Fund II will be invested in approximately 15-20 innovative early-stage start-up companies. This is the first institutional money raised for the Founders Fund, representing a significant increase over the original fund of $50 million, which was raised from personal investments by the managing partners and select outside investors.

San Francisco based Founders Fund launched in 2005 with a $50 million venture fund. Theyâ€™ve had two liquidity events since then, and a number of other very high profile participations like Facebook, Powerset, Ooma, Quantcast, Slide, Geni and Causes.

â€œWe believe entrepreneurs are looking for people like themselves, people who also have taken ideas and made them a reality. This second fund allows us to invest in areas for which we have deep insight, personal experience and passion for seeing the companies succeed,â€ said Luke Nosek, a Founders Fund managing partner. â€œOur collective experience starting companies and funding innovative start-ups positions the Founders Fund as a unique, valuable resource at the early investment stage.â€

The Founders Fund will continue to offer Series FF stock, which is being adopted across the industry adding to the unique approach to funding entrepreneurs. The stock is offered to start-up founders who can convert Series FF stock to preferred stock during subsequent rounds of funding. This allows Series FF stock holders to sell a portion of their stock and aligns their interests with their investors.

â€œThe traditional venture capital model is broken,â€ said Sean Parker, a Founders Fund managing partner. â€œBy offering tools like the Series FF stock, we are helping create a new model of investment and alignment of interests, confirming our commitment to the founders of our companies. This fund is truly for founders by founders.â€

A couple of investments have been made out of the new fund, they say, but have not yet been disclosed.

The four managing partners have all started their own companies and between them have seen the process from inception to start up to IPO.

â€œFounders Fund was started to make a difference for companies looking for funding to execute on their big ideas. We believe the alignment of interests with our portfolio companies is the next step in the evolution of collaborative investments,â€ said Ken Howery, a Founders Fund managing partner. â€œFounders Fund II will give us the opportunity to continue to invest in the people and ideas that are truly bringing innovation to the Internet industry.â€

Peter Thiel, one of four managing partners for The Founders Fund and an early backer and board member of the social network Facebook said, â€œThis is one of the most innovative venture teams ever assembled. Our unique skill set, expertise and perspective support our shared desire to build and invest in great companies from the ground up.â€

Parker says he learned a powerful lesson about the importance of taking time to build a business from observing the trajectories of some of the valley’s most successful businesses. What would have happened if the founders had sold those companies before fine-tuning them? PayPal started out as an encryption product that beamed money between mobile devices before hitting on the online payment business that it ultimately sold to eBay for $1.5 billion. Google didn’t strike Internet ore until the paid search market had time to fully develop.

“Largely because we were all founders ourselves, we’re inherently more interested in helping new entrepreneurs develop into successful leaders than we are in getting rich,” Parker said. “As someone who has started and run a few companies myself, my primary interest is in helping creative people build companies and run those companies over the long-term. I also happen to believe that this is the best way to create value for my limited partners, and by extension, for myself.”

However, some institutional investors were skeptical of the partners and passed on the opportunity to put in money. Parker confirmed that the fund-raising process turned out to be more time consuming than the firm had expected. But he also said limited partners had invested because their model — namely, a venture firm run by founders with experience — was needed in the industry. The firm originally sought to raise $150 million, but ended up raising $220 million.

More about The Founders Fund

Based in San Francisco, Calif. and founded in 2005, Founders Fund is a group of four proven entrepreneurs with a shared vision: to change the way venture investments are made. Founders Fund seeks to provide the capital, insights and support required to build a company from the ground up and sustain successful enterprises with a non-traditional, founder-focused approach. Their current portfolio includes Facebook, Geni, Powerset, Ooma, Quantcast, Slide and others.

The Managing Partners

Peter Thiel
Peter’s experience with venture finance began in the 1990s, when he ran Thiel Capital Management, a Menlo Park-based hedge fund that also made private equity investments. In 1998, Peter co-founded PayPal and served as its Chairman and CEO until the company’s sale to eBay in October 2002 for $1.5 billion. Peter’s experience in finance includes managing a successful hedge fund, trading derivatives at CS Financial Products, and practicing securities law at Sullivan & Cromwell. Peter sits on the Board of Directors of the Pacific Research Institute and on the Board of Visitors of Stanford Law School. Peter received his BA in Philosophy and his JD from Stanford.

Peter Thiel is a 39-year-old maverick money manager who in the past four years has turned his $60 million payout from the sale of the PayPal online payment service he co-founded into a growing financial fiefdom. He runs Clarium Capital Management LLC, one of the nation’s most successful and daring hedge funds with $3 billion in assets, and The Founders Fund, a tiny but increasingly influential venture capital firm with a laser-beam focus on consumer Internet startups.

In late 2004, Peter Thiel made a $500,000 angel investment in Facebook. Microsoft recently purchased 1.6 percent of the company for $240 million, which values Facebook at roughly $15 billion and Thielâ€™s stake at roughly $1 billion.

Ken Howery
Ken is a co-founder of PayPal and served as the company’s first CFO. While at PayPal, Ken helped raise over $200 million in private financing, worked on the company’s public offerings, and assisted in the company’s $1.5 billion sale to eBay. Ken has also been a member of the research and trading teams at Clarium Capital Management, a global macro hedge fund based in San Francisco with over $3 billion under management, and at Thiel Capital Management, a multistrategy investment fund, where Ken made venture investments beginning in 1998. Ken received a BA in Economics from Stanford.

Luke Nosek
Luke Nosek is a co-founder of PayPal and served as the company’s Vice President of Marketing and Strategy. While at PayPal, Luke oversaw the company’s marketing efforts at launch, growing the user base to 1 million customers in the first six months. Luke also created “Instant Transfer,” PayPal’s most profitable product. Prior to PayPal, Luke was an evangelist at Netscape. Luke has also co-founded two other consumer Internet companies, including the web’s first advertising network, and has made a number of venture investments since 2000. Luke received a B.S. in Computer Science from the University of Illinois, Urbana-Champaign.

Sean Parker
Sean Parker is the co-founder and Chairman of “Project Agape,” a new network that aims to enable large-scale political and social activism on the Internet. Previously, Sean was the co-founder of the category defining Web ventures Napster, Plaxo, and Facebook. At Napster, Sean helped to design the Napster client software and led the company’s initial financing and strategy. Under Sean’s leadership, Napster became the fastest adopted client software application in history. Following Napster, Sean co-founded and served as President of Plaxo, where he pioneered the viral engineering techniques used to deploy Plaxo’s flagship smart address book product, ultimately acquiring more than 15 million users. In 2004, Sean left Plaxo to become the founding President of Facebook, one of the most rapidly growing sites on the Internet today. Sean sits on the boards of several private companies.

http://www.foundersfund.com/
http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2006/12/13/MNGECMUMRE1.DTL
http://www.techcrunch.com/2007/12/17/founders-fund-closes-220-million-second-fund/
http://www.businesswire.com/news/google/20071217006220/en
http://en.wikipedia.org/wiki/Peter_Thiel
http://www.latimes.com/business/investing/la-fi-founders18dec18,1,6840237.story?coll=la-headlines-business-invest&ctrack=2&cset=true
http://venturebeat.com/2007/12/18/founders-fund-raises-new-fund-aims-for-more-vc-disruption/

Business, Enterprise, Funding, Internet, Investments, Media, Money, Semantic Apps, Semantic Web, Software, Spark Capital, Technology, Venture Capital, Web 2.0

Inform receives $15 Million investment from Spark Capital

January 24, 2008 Web 2.0 Innovations Leave a comment

Inform Technologies, a technology solution for established media brands, has received a $15 million investment from Spark Capital, a Boston-based venture fund focused on the intersection of the media, entertainment and technology industries.

The company said in their PR they are going to use the funds to accelerate growth. The company also claims nearly 100 media brands use Informâ€™s journalistic technology to enhance their sites. Â

Founded in 2004, Inform currently works with nearly 100 major media brands to help them ensure that their sites are content destinations and offers editorial-quality features that keep readers engaged on their sites longer â€“ and that increase page views and revenue potential.

Informâ€™s key offering is a technology solution that acts as an extra editor. It starts with a page of text, and then, with editorial precision, it automatically creates and organizes links to relevant content from the media propertyâ€™s site, its archives, from affiliate sites and/or anywhere else on the Web.Â As a result, each page on a site becomes a richer multimedia experience.

Said James Satloff, CEO of Inform, â€œMedia companies face significant challenges online. They need to attract new unique visitors, create an experience that compels those readers to spend more time consuming more pages, and then turn those page views and time on site into revenue. We believe that the Inform solution enables them to do exactly that.â€

Longstanding Inform clients include Conde Nast, Crain Communications, IDG, The New York Sun and Washington.Post.Newsweek Interactive. In recent months, 30 additional media properties have engaged Inform â€“ many already running Informâ€™s technology on their sites.

Inform uses artificial intelligence and proprietary rules and algorithms to scan millions of pages of text and read the way a journalist does â€“ identifying key â€œentities,â€ such as people, places, companies and products, and recognizing how they connect, even in subtle and context-specific ways. The software continually teaches itself â€“ in real time â€“ how information is related and automatically updates links and topics as the context changes.

Santo Politi, Founder and Partner at Spark Capital, commented on the following â€œEstablished media brands need cost-effective ways to compete with each other and, importantly, with other online presences, such as search. They need depth and richness in their content so theyâ€™re true destinations and so readers spend more time on the sites and click through more pages. Inform provides a truly elegant â€“ and so far very successful â€“ solution for that. While allowing the publication to remain in full control of its content and editorial integrity, Inform automatically enriches a site by enabling it to leverage its own content, its archives, archives of affiliates and the web overall. In effect, it enables a publication to expand its editorial capabilities without expanding its staff. We believe the potential for Informâ€™s growth is substantial.â€

Â â€œWeâ€™re delighted that our new investor understands how effectively we partner with media companies and how our technology serves their business and editorial objectives. We will use the capital to expand our operations and implement our approach to accelerating our growth.â€ Said Joseph Einhorn, Co-Founder and CTO of Inform.

We went over Web and researched a bit over the company. It turns out the company has shifted the focus quite often over the past several years. In 2005 the company once said to be around to provide a useful news interface – both blog and non-blog – and to show the interconnectedness of all of the content. Later the same year a major re-launch and re-design struck the company and they have given up on the Ajax based pop-up and have also added vide and audio, which hardly fits into the concept of contextual connection between two content areas/texts based on their semantic textual analysis, unless they have come up to an idea how to read inside and understand image and video files. Google, by contrast, seems to have come up to technology that claims to recognize text in images.Â In late 2006 the company brought to the market their so called Inform Publisher Services, which was aimed at big web publishers, and was designed to help them increase page views by adding relevant links to other, hopefully related, content in their archives.

The new service was meant to automatically create links in existing articles, which link to a results page containing relevant content from the site as well as from the web, including blogs and audio/video content. Sounds like Sphere and LinkedWords. Basically their latest offering comes closer to what the Inform.com is today.

Some critics on the service have published the following doubts online over a few blogs we have checked out in regard to Inform.

Isn’t this the opposite of semantic web, since they’re sucking in unstructured data? How does their relatedness stuff compare to Sphere and how do their topic pages compare to Topix?

Marshall KirkpatrickÂ from RWW has put it that way when the question about standards and openness was raised.

â€œInform crunches straight text and outputs HTML. I asked whether they publish content with any standards based semantic markup and they said that actual publishing is up to publishers. That’s a shame, I don’t see any reason why Inform wouldn’t participate in the larger semantic web to make its publishers’ content more discoverable. Perhaps when you’ve got 100 live clients and now $15m in the bank, it feels like there’s no reason to open up and play nice with a movement of dreamers having trouble getting other apps out of academia.â€

Competition include Sphere, Proximic, Lijit, Adaptiveblue, LinkedWords, somehow NosyJoe, Jiglu, among others. Other, although remote, players in this space include Attendi, Diigo, Twine and Freebase.

More about Inform

Inform Technologies is a new technology solution for established media brands that automatically searches, organizes and links content to provide a rich, compelling experience that attracts and retains readers.

With editorial-quality precision, the technology understands textual content and recognizes subtle differences in meaning. Further, the technology automatically creates links – in articles and on instantly generated topic pages – to relevant content. This deepens a site and engages readers.

Inform’s Essential Technology platform is an artificial intelligence and natural language-based solution that serves almost as an “extra editor” using rules and algorithms to “read” millions of pages of content, identify entities, such as people, places, companies, organizations and products, and topics, to create intelligent links to other closely related information. The technology is also able to recognize subtle differences in meaning and distinguish people, places and things based on local geographies or unique identities.

Informâ€™s Connected Content Solution and Essential Technology Platform are used by major media brands including CNN.com, WashingtonPost, Newsweek Interactive, Conde Nast, Meredith, IDG and Crain Communications.

Founded in 2004, the company is privately held and has approximately 60 employees, including mathematicians, linguists, programmers, taxonomists, library scientists and other professionals based in New York and India.

About Spark Capital

Spark Capital is a venture capital fund focused on building businesses that transform the distribution, management and monetization of media and content, with experience in identifying and actively building market-leading companies in sectors including infrastructure (Qtera, RiverDelta, Aether Systems, Broadbus and BigBand), networks (College Sports Television, TVONE and XCOM) and services (Akamai and the Platform). Spark Capital has over $600 million under management, and is based in Boston, Massachusetts. Spark has committed to investing $20 million in CNET equity.

http://www.inform.com/Â
http://www.inform.com/pr.012308.html
http://www.readwriteweb.com/archives/inform_funding.php
http://www.micropersuasion.com/2005/10/a_new_rss_reade.html
http://www.paidcontent.org/pc/arch/2005_10_16.shtml#051884
http://www.techcrunch.com/tag/inform.com/
http://blog.express-press-release.com/2007/10/19/a-bunch-of-intelligent-and-smart-content-tagging-engines/
http://www.techcrunch.com/2007/10/19/twine-launches-a-smarter-way-to-organize-your-online-life/
http://blog.nosyjoe.com/2007/09/06/nosyjoecom-is-now-searching-for-tags/
http://nextnetnews.blogspot.com/2007/09/is-nosyjoecom-next-clustycom.html
http://kalsey.com/2007/10/jiglu_tags_that_think/
http://mashable.com/2007/10/15/jiglu/
http://www.nytimes.com/2005/10/17/technology/17ecom.html
http://www.techcrunch.com/2005/10/16/informcom-doesnt/
http://www.techcrunch.com/2005/10/24/a-second-look-at-informcom/
http://www.techcrunch.com/2005/12/05/informcom-re-launches-with-major-feature-changes/
http://business2.blogs.com/business2blog/2006/07/scoop_inform_re.html
http://www.techcrunch.com/2006/07/30/informcoms-latest-offering/
http://www.quantcast.com/inform.com
http://bits.blogs.nytimes.com/2007/07/04/when-search-results-include-more-search-results/

Benchmark Capital, Business, Funding, Goldman Sachs, Internet, Investments, Millennium Technology Ventures, Money, Omidyar Network, Semantic Apps, Semantic Web, Software, Technology, Venture Capital, Web 2.0

Massive second round of funding for Freebase – $42 Million

January 18, 2008 Web 2.0 Innovations Leave a comment

Freebase, the open and shared database of the worldâ€™s knowledge, has raised a whopping amount of money in its Series B round of funding, $42 Million, in a round that included Benchmark Capital and Goldman Sachs. Total funding to date is $57 million.

The investment is considerable, and comes at a time when a number of experts are betting that a more powerful, â€œsemanticâ€ Web is about to emerge, where data about information is much more structured than it is today.

In March 2006, Freebase received $15 million in funding from investors including Benchmark Capital, Millennium Technology Ventures and Omidyar Network.

Freebase, created by Metaweb Technologies, is an open database of the worldâ€™s information. Itâ€™s built by the community and for the community â€“ free for anyone to query, contribute to, build applications on top of, or integrate into their websites.

Already, Freebase covers millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC archives, it contains structured information on many popular topics, including movies, music, people and locations â€“ all reconciled and freely available via an open API. This information is supplemented by the efforts of a passionate global community of users who are working together to add structured information on everything from philosophy to European railway stations to the chemical properties of common food ingredients.

By structuring the worldâ€™s data in this manner, the Freebase community is creating a global resource that will one day allow people and machines everywhere to access information far more easily and quickly than they can today.

FreebaseÂ aims to “open up the silos of data and the connections between them”, according to founder Danny Hillis at the Web 2.0 Summit. Freebase is a database that has all kinds of data in it and an API. Because it’s an open database, anyone can enter new data in Freebase. An example page in the Freebase db looks pretty similar to a Wikipedia page. When you enter new data, the app can make suggestions about content. The topics in Freebase are organized by type, and you can connect pages with links, semantic tagging. So in summary, Freebase is all about shared data and what you can do with it.

Hereâ€™s a video tour of how does Freebase work. Freebase categorizes knowledge according to thousands of â€œtypesâ€ of information, such as film, director or city. Those are the highest order of categorization. Then underneath those types you have â€œtopics,â€ which are individual examples of the types — such as Annie Hall and Woody Allen. It boasts two million topics to date. This lets Freebase represent information in a structured way, to support queries from web developers wanting to build applications around them. It also solicits people to contribute their knowledge to the database, governed by a community of editors. It offers a Creative Commons license so that it can be used to power applications, on an open API.

This is one of the biggest Series B rounds for the past 12 months. And probably what Google tries to do with its Knol to Wikipedia is the same what Freebase tries to achieve too â€“ replicate and commercialize the huge success of the non-profit Wikipedia.

Other semantic applications and projects include Powerset, Twine, AdaptiveBlue, Hakia, Talis, LinkedWords, NosyJoe, TrueKnowledge, among others.

Peter Rip, an investor in Twine has quickly reacted on the comparison between the two Freebase and Twine the VentureBeatâ€™s Matt Marshall made.

As an investor in Twine, allow me correct you about Twine and Metawebâ€™s positioning. You correctly point out that Metaweb is building a database about concepts and things on the Web. Twine is not. Twine is really more of an application than a database. It is a way for persons to share information about their interests. So they are complementary, not competitive.

Whatâ€™s most important is that Twine will be able to use all the structure in something like Metaweb (and other content sources) to enrich the userâ€™s ability to track and manage information. Think of Metaweb as a content repository and Twine as as the app that uses content for specific purposes.

Twine is still in closed beta. So the confusion is understandable, especially with all the hype surrounding the category.

Nova Spivack, the founder of Twine has also commented on.

Freebase and Twine are not competitive. That should be corrected in the above article. In fact our products are very different and have different audiences. Twine is for helping people and groups share knowledge around their interests and activities. It is for managing personal and group knowledge, and ultimately for building smarter communities of interest and smarter teams.

Metaweb, by contrast, is a data source that Twine can use, but is not focused on individuals or on groups. Rather Metaweb is building a single public information database, that is similar to the Wikipedia in some respects. This is a major difference in focus and functionality. To use an analogy, Twine is more like a semantic Facebook, and Metaweb is more like a semantic Wikipedia.

Freebase is in alpha.

Freebase.com was the first Semantic App being featured by Web2Innovations in its series of planned publications where we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

The purpose of these publications is to discover and showcase todayâ€™s Semantic Web Apps and projects. We’re not going to rank them, because there is no way to rank these apps at this time – many are still in alpha and private beta.
More

http://www.metaweb.com/about/
http://freebase.com
http://roblog.freebase.com
http://venturebeat.com/2008/01/14/shared-database-metaweb-gets-42m-boost/
http://www.techcrunch.com/2008/01/16/freebase-takes-42-million/
http://www.dmwmedia.com/news/2008/01/15/freebase-developer-metaweb-technologies-gets-$42.4-million
http://www.crunchbase.com/company/freebase
http://www.readwriteweb.com/archives/10_semantic_apps_to_watch.php
http://en.wikipedia.org/wiki/Danny_Hillis
http://www.metaweb.com
http://en.wikipedia.org/wiki/Metaweb_Technologies
https://web2innovations.com/money/2007/11/30/freebase-open-shared-database-of-the-worlds-knowledge/
http://mashable.com/2007/07/17/freebase/
http://squio.nl/blog/2007/04/02/freebase-life-the-universe-and-everything/

Adobe, Draper Fisher Jurvetson, Google, Internet, Public Companies, Ron Conway, Search Engines, Semantic Web, Software, Technology, video, Web 2.0

Google files patent for recognizing text in images

January 4, 2008 Web 2.0 Innovations Leave a comment

Google has filed a patent application in July 2007, which has just recently become public claiming methods where robots can read and understand text in images and video. The basic idea here is Google to be able to index videos and images and made them available and searchable by text or keywords located inside the image or the video. Aside Google Inc. the application was filed by Luc Vincent from Palo Alto, Calif. and Adrian Ulges from Delaware, US. The inventors are Luc Vincent and Adrian Ulges.

Digital images can include a wide variety of content. For example, digital images can illustrate landscapes, people, urban scenes, and other objects. Digital images often include text. Digital images can be captured, for example, using cameras or digital video recorders. Image text (i.e., text in an image) typically includes text of varying size, orientation, and typeface. Text in a digital image derived, for example, from an urban scene (e.g., a city street scene) often provides information about the displayed scene or location. A typical street scene includes, for example, text as part of street signs, building names, address numbers, and window signs.â€

If Google manages to implement that technology the consumer search will be taken to the next level and Google would have an access to much wider array of information far beyond the text only search it already plays a leading role in.

This, of course, raises some additional privacy issues as being properly noted by InformationWeek. Gogole had already privacy issues with Google Maps Street View and if that technology starts to index and recognize textual information from millions of videos and billions of pictures around Web things might go more complicated.
Â
Nonetheless if that technology bears the fruits it promises it will represent a gigantic leap forward in the progression of the general search technology.

There are open sources solutions to the problem. Perhaps not scalable and effective as it would be if Google develops it, yet they do exist.

Andrey Kucherenko from Ukraine is known to have made a very interesting project in that aspect. His classes can recognize text in monochrome graphical images after a training phase. The training phase is necessary to let the class build recognition data structures from images that have known characters. The training data structures are used during the recognition process to attempt to identify text in real images using the corner algorithm. His project is called PHPOCR and more information can be found over here.

PHPOCR have won the PHPClasses innovation awards of March 2006, and it shows the power of what could be implemented with PHP5. Certain types of applications require reading text from documents that are stored as graphical images. That is the case of scanned documents.

An OCR (Optical Character Recognition) tool can be used to recover the original text that is written in scanned documents. These are sophisticated tools that are trained to recognize text in graphical images.

This class provides a base implementation for an OCR tool. It can be trained to learn how to recognize each letter drawn in an image. Then it can be used to recognize longer texts in real documents.

Another very interesting start-up believed to be heavily deploying text recognition inside videos is CastTV. The company is based in San Francisco and over its just $3M in funding is trying to build one of the Web’s best video search engines. CastTV lets users find all their favorite online videos, from TV shows to movies to the latest celebrity, sports, news, and viral Internet videos. The company’s proprietary technology addresses two main video search challenges: finding and cataloging videos from the web and delivering relevant video results to users.

CastTV was one of the presenters at Techcrunch40 and was there noticed by Marissa Mayer from Google. She asked CastTV the following question: â€œWould like to know more about your matching algo for the video search engines?â€. CastTV then replied: â€œWe have been scaling as the video market grows – relevancy is a very tough problem – we are matching 3rd party sites and supplementing the meta data.â€

Today we see Marissaâ€™s question in the light of the patent application above and the context seems quite different and the answer from CastTV did not address Googleâ€™s concerns. Does CastTV work on something similar to what the patent is trying to cover for Google? We do not know but the time will tell. CastTVâ€™s investors are Draper Fisher Jurvetson and Ron Conway. Hope they make a nice exit from CastTV.
Â
Adobe has also some advances in that particular area. You can use Acrobat to recognize text in previously scanned documents that have already been converted to PDF. OCR (Optical Character Recognition) runs with header/footer/Bates number on image PDF files.

It is also interesting that Microsoft had, in fact, applied for a very similar patent (called “Visual and Multi-Dimensional Search“). Even more interesting here is the fact thatÂ MS had beaten Google to the punch by filing three days earlier – Microsoft filed on June 26, 2007, while Google filed on June 29.

Full abstract, description and claims can be read below:

http://google.com
http://www.wipo.int/pctdb/en/ia.jsp?IA=US2007072578&DISPLAY=STATUS
http://www.techmeme.com/080104/p23
http://www.techcrunch.com/2008/01/04/google-lodges-patent-for-reading-text-in-images-and-video/
http://www.webmasterworld.com/goog/3540344.htm
http://enterprise.phpmagazine.net/2006/04/recognize_text_objects_in_grap.html
http://www.phpclasses.org/browse/package/2874.html
http://www.crunchbase.com/company/casttv
http://www.casttv.com/
http://www.google.com/corporate/execs.html
http://www.centernetworks.com/techcrunch40-search-and-discovery
http://www.setthings.com/2008/01/04/recognizing-text-in-images-patent-by-google/
http://help.adobe.com/en_US/Acrobat/8.0/Professional/help.html?content=WS2A3DD1FA-CFA5-4cf6-B993-159299574AB8.html
http://www.techcrunch40.com/
http://www.therottenword.com/2008/01/microsoft-beats-google-to-image-text.html

Alexandra Investment Management, Business, Funding, Internet, Investments, Money, Noble Grossart Investments, Prokom Investments, Search Engines, Semantic Apps, Semantic Web, Technology, Web 2.0

Hakia takes $5M more, totals $16M

December 20, 2007 Web 2.0 Innovations Leave a comment

In a new round of funding Hakia, the natural language processing search engine has raised additional $5M. The money came from a previous investor, some of which are Noble Grossart Investments, Alexandra Investment Management, Prokom Investments, KVK, and several angel investors. With plans of fully launching some time next year, Hakia has been working towards improving its relevancy and adding some social features like â€œMeet the Othersâ€ to their site. Hakia is known to have raised $11 million in its first round of funding in late 2006 from a panoply of investors scattered across the globe who were attracted by the companyâ€™s semantic search technology. As far as we know, the company’s total funding is now $16M.

We think that from all alternative search engines, excluding Ask.com and Clusty.com, Hakia seems to be one of the most trafficked engines with almost 1M unique visitors as we last checked the siteâ€™s publicly available stats. If it is us to rank the most popular search engines I would put them the following way: Google, Yahoo, Ask.com, MSN, Naver, some other regional leaders, Clusty and perhaps somewhere there is hakia going.

On the other hand and according to Quantcast, Hakia is basically not so popular site and is reaching less than 150,000 unique visitors per month. Compete is reporting much better numbers – slightly below 1 million uniques per month. Considering the fact the search engine is still in its beta stage these numbers are more than great. However, analyzing further the traffic curve on both measuring sites above it appears that the traffic hakia gets is sort of campaign based, in other words generated due to advertising, promotion or PR activity and is not permanent organic traffic due to heavy usage of the site.

In related news a few days ago Googleâ€™s head of research Peter Norvig said that we should not expect to see natural-language search at Google anytime soon.

In a Q&A with Technology Review, he says:

We donâ€™t think itâ€™s a big advance to be able to type something as a question as opposed to keywords. Typing â€œWhat is the capital of France?â€ wonâ€™t get you better results than typing â€œcapital of France.â€

Yet he does acknowledge that there is some value in the technology:

We think (Google) whatâ€™s important about natural language is the mapping of words onto the concepts that users are looking for. To give some examples, â€œNew Yorkâ€ is different from â€œYork,â€ but â€œVegasâ€ is the same as â€œLas Vegas,â€ and â€œJerseyâ€ may or may not be the same as â€œNew Jersey.â€ Thatâ€™s a natural-language aspect that weâ€™re focusing on. Most of what we do is at the word and phrase level; weâ€™re not concentrating on the sentence. We think itâ€™s important to get the right results rather than change the interface.

In other words, a natural-language approach is useful on the back-end to create better results, but it does not present a better user experience. Most people are too lazy to type in more than one or two words into a search box anyway. The folks at both Google and Yahoo know that is true for the majority of searchers. The natural-language search startups are going to find out about that the hard way.

Founded in 2004, hakia is a privately held company with headquarters in downtown Manhattan. hakia operates globally with teams in the United States, Turkey, England, Germany, and Poland.

[ http://venturebeat.com/2007/12/12/hakia-raising-5m-for-semantic-search/ ]
[ http://mashable.com/2007/12/12/hakia-funded/ ]
[ http://www.hakia.com/ ]
[ http://blog.hakia.com/ ]
[ http://www.hakia.com/about.html ]
[ http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php ]
[ http://www.readwriteweb.com/archives/hakia_meaning-based_search.php ]
[ http://siteanalytics.compete.com/hakia.com/?metric=uv ]
[ http://www.internetoutsider.com/2007/07/the-big-problem.html ]
[ http://www.quantcast.com/search/hakia.com ]
[ http://www.redherring.com/Home/19789 ]
[ http://web2innovations.com/hakia.com.php ]
[ http://www.pandia.com/sew/507-hakia.html ]
[ http://www.searchenginejournal.com/hakias-semantic-search-the-answer-to-poor-keyword-based-relevancy/5246/ ]
[ http://arstechnica.com/articles/culture/hakia-semantic-search-set-to-music.ars ]
[ http://www.news.com/8301-10784_3-9800141-7.html ]
[ http://searchforbettersearch.com/ ]
[ https://web2innovations.com/money/2007/12/01/is-google-trying-to-become-a-social-search-engine/ ]
[ http://www.web2summit.com/cs/web2006/view/e_spkr/3008 ]
[ http://www.techcrunch.com/2007/12/18/googles-norvig-is-down-on-natural-language-search/ ]

Advertising, Business, Google, Internet, Public Companies, Search Engines, Semantic Web, Software, Technology, Web 2.0

Google is taking on Wikipedia

December 15, 2007 Web 2.0 Innovations Leave a comment

Once known as one of the strongest and beneficial friendships on the Web between two hugely popular and recognized giants is today going to turn out into an Internet battle second to none.

It is no secret on Web that Google was in love with Wikipedia over the past years turning this small and free encyclopedia project into one of the most visited sites on Web today with over 220 million unique visitors per month. It is believed that at least 85% of the total monthly traffic to Wikipedia is sent to by Google. One solid argument in support of that thesis can be the fact every second article on Wikipedia is being ranked among the first, if not the first, results in Googleâ€™s SERPs resulting in unprecedented organic traffic and participation.

It is also well known fact that Google wished they had the chance to acquire Wikipedia and if it was possible itâ€™s believed they could have done this even years ago. Due to the non-profit policy and structure Wikipedia is built upon it provided no legal pathway to such deal for Google to snatch the site in its early days.

Basically one can conclude that Google has always liked the idea and concept upon which Wikipedia is built up and since, due to obvious reasons, they were not able to buy the site they seem today are up to an idea dangerously similar to the Wikipedia and are obviously taking on the free encyclopedia.

News broke late yesterday that Google is in preparation to launch a new site called Knol to create a new user generated authoritative online knowledgebase of virtually everything.

Normally we would not pay attention on such type of news where a large-scale corporation is trying to copy/cat an already popular and established business model (concept) that did not turn into a large-scale company itself. This is happening all the time and is part of the modern capitalism except we found a couple of strategic facts that provoked us to express our opinion.

First of all the mythical authority and popularity of Wikipedia seems to be under attack and unlike any of the other attempts encountered before this time it is Google, a company that is possessing a higher degree of chance to make it happen, undermining Wikipedia despite its huge popularity and idealistic approach today.

A couple of weeks ago we have written an in-depth analysis how yet another mythical site Dmoz.org has fallen down and is on its half way to totally disintegrate itself and the only reason behind this trend we have found is the voluntary approach and principle the site relied ever since – almost 10 years of existence.

We think the same problem is endangering Wikipedia too and perhaps it is just matter of time we witness how the hugely popular free encyclopedia today will some day in the future start disintegrating the same way it happened to Dmoz.org due to the same reason â€“ it hugely relies on and is heavily dependant upon the voluntary principle and the contribution of thousands of skilled and knowledgeable individuals. However we all know there is no free lunch, at least not in America. And once Wikipedia has its mythical image, today everyone wants to be associated with, lost and is no longer passing authority and respect on to its free knowledgeable contributors the free encyclopedia will then most likely start disintegrating and whatâ€™s today known to be an authoritative and high-quality knowledge data base will then become one of the biggest repository of low-quality and link rich articles of controversial and objectable information on the Web. Pretty much the same has already happened to Dmoz.org. The less the Wikipedia volunteers become interested to keep contributing their time and knowledge to the free site while fighting with an ever growing army of spammers and corporate PRs the more the low-quality and less authoritative information on the Wikipedia will grow to and that process appears unavoidable.

This is what Google seems to be up to and is looking forward to change. Google wants to compensate those knowledgeable contributors on a long term run that way avoid a potential crash in the future, which is unavoidable for every free-based service on the planet that had the luck to grow out of size.Â

Having more than $10 billion in annual sales (most of it represents pure profit), and willingness to share that money with these knowledgeable people around the globe, as well as relying on more than 500 million unique visitors per month Google seems to be on the right track to achieve what Wikipedia will most likely fail at.

Otherwise Wikipedia is a greater idea than Google itself but anything the size and ambitious of Wikipedia today does require an enormous amount of resources to keep alive, under control and effectively working for the future. Wikipedia has been trying to raise money for a long time now with no viable success. On the other hand, Google has already these resources in place.

Google has already said that Knol results will be in Googleâ€™s index, presumably on the first page, and very possibly at the top: â€œOur job in Search Quality will be to rank the knols appropriately when they appear in Google search results.â€ Google wants Knol to be an authoritative page: â€œA knol on a particular topic is meant to be the first thing someone who searches for this topic for the first time will want to readâ€ and thatâ€™s already a direct challenge to Wikipedia.

If Wikipedia is being replaced in the first top results on Google with pages from Knol respectively, Wikipedia traffic will definitely decrease, and possibly as a consequence so will broader participation on Wikipedia.

Will Knol be the answer of the Web of Knowledge everybody is looking for? We do not know but one is for sure today it is going to hurt Wikipedia and not the ordinary user of the aggregated knowledge base Wikipedia is. The entire army of both users and contributors will possibly move to Knol, for longer, or at least until Google finds ways to pay for the knowledge aggregation and its contributors.

Other companies that will eventually get hurt are as follows: Freebase, About.com, Wikia, Mahalo and Squidoo.

Below is a screenshot of the Knol’s reference page and how it would eventually look like:

[ http://www.google.com/help/knol_screenshot.html ]
[ http://googleblog.blogspot.com/2007/12/encouraging-people-to-contribute.html ]
[ http://www.techcrunch.com/2007/12/13/google-preparing-to-launch-game-changing-wikipedia-meets-squidoo-project/ ]
[ http://www.techcrunch.com/2007/12/14/google-knol-a-step-too-far/ ]
[ http://www.readwriteweb.com/archives/knol_project_google_experiment.php ]
[ http://www.webware.com/8301-1_109-9834175-2.html?part=rss&tag=feed&subj=Webware ]
Â [ http://searchengineland.com/071213-213400.php ]
[ http://www.news.com/Google-develops-Wikipedia-rival/2100-1038_3-6222872.html ]
[ http://www.micropersuasion.com/2007/12/wikipedia-and-w.html ]
Â

Alexandra Investment Management, Funding, Internet, Investments, Search Engines, Semantic Apps, Semantic Web, Social Search Engines, Software, Technology, Web 2.0

Hakia takes on major search engines backed up by a small army of international investors

December 11, 2007 Web 2.0 Innovations 1 Comment

In our planned series of publications about the Semantic Web and its Apps today Hakia is our 3rd featured company.

Hakia.com, just like Freebase and Powerset is also heavily relying on Semantic technologies to produce and deliver hopefully better and meaningful results to its users.

Hakia is building the Web’s new “meaning-based” (semantic) search engine with the sole purpose of improving search relevancy and interactivity, pushing the current boundaries of Web search. The benefits to the end user are search efficiency, richness of information, and time savings. The basic promise is to bring search results by meaning match – similar to the human brain’s cognitive skills – rather than by the mere occurrence (or popularity) of search terms. Hakia’s new technology is a radical departure from the conventional indexing approach, because indexing has severe limitations to handle full-scale semantic search.

Hakia’s capabilities will appeal to all Web searchers – especially those engaged in research on knowledge intensive subjects, such as medicine, law, finance, science, and literature. The mission of hakia is the commitment to search for better search.

Here are the technological differences of hakia in comparison to conventional search engines.

QDEX Infrastructure

hakia’s designers broke from decades-old indexing method and built a more advanced system called QDEX (stands for Query Detection and Extraction) to enable semantic analysis of Web pages, and “meaning-based” search.Â
QDEX analyzes each Web page much more intensely, dissecting it to its knowledge bits, then storing them as gateways to all possible queries one can ask.
The information density in the QDEX system is significantly higher than that of a typical index table, which is a basic requirement for undertaking full semantic analysis.
The QDEX data resides on a distributed network of fast servers using a mosaic-like data storage structure.
QDEX has superior scalability properties because data segments are independent of each other.

SemanticRank Algorithm

SemanticRank algorithm of hakia is comprised of innovative solutions from the disciplines of Ontological Semantics, Fuzzy Logic, Computational Linguistics, and Mathematics.Â
Designed for the expressed purpose of higher relevancy.
Sets the stage for search based on meaning of content rather than the mere presence or popularity of keywords.
Deploys a layer of on-the-fly analysis with superb scalability properties.
Takes into account the credibility of sources among equally meaningful results.
Evolves its capacity of understanding text from BETA operation onward.

In our tests weâ€™ve asked Hakia three English-language based questions:

Why did the stock market crash? [ http://www.hakia.com/search.aspx?q=why+did+the+stock+market+crash%3F ]
Where do I get good bagels in Brooklyn? [ http://www.hakia.com/search.aspx?q=where+can+i+find+good+bagels+in+brooklyn ]
Who invented the Internet? [ http://www.hakia.com/search.aspx?q=who+invented+the+internet ]

It basically returnedÂ intelligent results for all. For example, Hakia understood that, when we asked “why,” I would be interested in results with the words “reason for”–and produced some relevant ones.Â

HakiaÂ is one of the few promising Alternative Search EnginesÂ as being closely watched by Charles Knight at his blog AltSearchEngines.com, with a focus on natural language processing methods to try and deliver ‘meaningful’ search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. Most other major search engines, including Google, analyze keywords. The company believes that the future of search engines will go beyond keyword analysis – search engines will talk back to you and in effect become your search assistant. One point worth noting here is that, currently, Hakia still has some human post-editing going on – so it isn’t 100% computer powered at this point and is close to human-powered search engine or combination of the two.

They hope to provide better search results with complex queries than Google currently offers, but they have a long way to catch up, considering Googleâ€™s vast lead in the search market, sophisticated technology, and rich coffers. Hakiaâ€™s semantic search technology aims to understand the meaning of search queries to improve the relevancy of the search results.

Instead of relying on indexing the web or on the popularity of particular web pages, as many search engines do, hakia tries to match the meaning of the search terms to mimic the cognitive processes of the human brain.

â€œWeâ€™re mainly focusing on the relevancy problem in the whole search experience,â€ said Dr. Berkan in an interview Friday. â€œYou enter a question and get better relevancy and better results.â€

Dr. Berkan contends that search engines that use indexing and popularity algorithms are not as reliable with combinations of four or more words since there are not enough statistics available on which to base the most relevant results.

â€œWhat we are doing is an ultimate approach, doing meaning-based searches so we understand the query and the text, and make an association between them by semantic analysis,â€ he said.

Analyzing whole sentences instead of keywords would indefinitely increase the cost to the company to index and process the worldâ€™s information. The case is pretty much the same with PowersetÂ where they are also doing deep contextual analysis on every sentence on every web page and is publicly known fact they have higher cost for indexing and analyzing than Google. Taking into consideration that Google is having more than 450,000 servers in several major data centers and hakiaâ€™s indexing and storage costs might be even higher the approach they are taking might cost their investors a fortune to keep the company alive.

It would be interesting enough to find out if hakia is also building their architecture upon the Hbase/Hadoop environment just like Powerset does.Â

In the context of indexing and storing the worldâ€™s information it worth mentioning that there is yet another start-up search engine called Cuill thatâ€™s claiming to have invented a technology for cheaper and faster indexation than Googleâ€™s. Cuill claims that their indexing costs will be 1/10th of Googleâ€™s, based on new search architectures and relevance methods.

Speaking also for semantic textual analysis and presentation of meaningful results NosyJoe.com is a great example of both, yet it seems it is not going to index and store the worldâ€™s information and then apply the contextual analysis to, but rather than is focusing on what is quality and important for the people participating in their social search engine.Â

A few months ago Hakia launched a new social featureÂ called “Meet Others” It will give you the option, from a search results page, to jump to a page on the service where everyone who searches for the topic can communicate.

For some idealizedÂ types of searching, it could be great. For example, suppose you were searching for information on a medical condition. Meet Others could connect you with other people looking for info about the condition, making an ad-hoc support group. On the Meet Others page, you’re able to add comments, or connect directly with the people on the page via anonymous e-mail or by Skype or instant messaging.

On the other hand implementing social recommendations and relying on social elements like Hakiaâ€™s Meet the Others feature one needs to have huge traffic toÂ turn that interestingÂ social feature into an effective information discovery tool. For example Google with its more than 500 million unique searchers per month can easily beat such social attempts undergone by the smaller players if they only decide to employ, in one way or another, their users to find, determine the relevancy, share and recommend results others also search for. Such attempts by Google are already in place as one can read over here: Is Google trying to become a social search engine.

Reach

According to Quantcast, Hakia is basically not so popular site and is reaching less than 150,000 unique visitors per month. Compete is reporting much better numbers – slightly below 1 million uniques per month. Considering the fact the search engine is still in its beta stage these numbers are more than great. Analyzing further the traffic curve on both measuring sites above it appears that the traffic hakia gets is sort of campaign based, in other words generated due to advertising, promotion or PR activity and is not permanent organic traffic due to heavy usage of the site.

The People

Founded in 2004, hakia is a privately held company with headquarters in downtown Manhattan. hakia operates globally with teams in the United States, Turkey, England, Germany, and Poland.

The Founder of hakia is Dr. Berkan who is a nuclear scientist with a specialization in artificial intelligence and fuzzy logic. He is the author of several articles in this area, including the book Fuzzy Systems Design Principles published by IEEE in 1997. Before launching hakia, Dr. Berkan worked for the U.S. Government for a decade with emphasis on information handling, criticality safety and safeguards. He holds a Ph.D. in Nuclear Engineering from the University of Tennessee, and B.S. in Physics from Hacettepe University, Turkey. He has been developing the companyâ€™s semantic search technology with help from Professor Victor Raskin of PurdueUniversity, who specializes in computational linguistics and ontological semantics, and is the companyâ€™s chief scientific advisor.

Dr. Berkan resisted VC firms because he worried they would demand too much control and push development too fast to get the technology to the product phase so they could earn back their investment.

When he met Dr. Raskin, he discovered they had similar ideas about search and semantic analysis, and by 2004 they had laid out their plans.

They currently have 20 programmers working on building the system in New York, and another 20 to 30 contractors working remotely from different locations around the world, including Turkey, Armenia, Russia, Germany, and Poland.
The programmers are developing the search engine so it can better handle complex queries and maybe surpass some of its larger competitors.

Management

Dr. Riza C. Berkan, Chief Executive Officer
Melek Pulatkonak, Chief Operating Officer
Tim McGuinness, Vice President, Search
Stacy Schinder, Director of Business Intelligence
Dr. Christian F. Hempelmann, Chief Scientific Officer
John Grzymala, Chief Financial Officer

Board of Directors

Dr. Pentti Kouri, Chairman
Â Dr. Riza C. Berkan, CEO
John Grzymala
Anuj Mathur, Alexandra Global Fund
Bill Bradley, former U.S. Senator
Murat Vargi, KVK
Ryszard Krauze, Prokom Investments

Advisory Board

Prof. Victor Raskin (Purdue University)
Prof. Yorick Wilks, (Sheffield University, UK)
Mark Hughes

Investors

Hakia is known to have raised $11 million in its first round of funding from a panoply of investors scattered across the globe who were attracted by the companyâ€™s semantic search technology.

The New York-based company said it decided to snub the usual players in the venture capital community lining Silicon Valleyâ€™s Sand Hill Road and opted for its international connections instead, including financial firms, angel investors, and a telecommunications company.

Poland

Among them were Polandâ€™s Prokom Investments, an investment group active in the oil, real estate, IT, financial, and biotech sectors.

Turkey

Another investor, Turkeyâ€™s KVK, distributes mobile telecom services and products in Turkey. Also from Turkey, angel investor Murat Vargi pitched in some funding. He is one of the founding shareholders in Turkcell, a mobile operator and the only Turkish company listed on the New York Stock Exchange.

Malaysia

In Malaysia, hakia secured funding from angel investor Lu Pat Ng, who represented his family, which has substantial investments in companies worldwide.
From Finland, hakia turned to Dr. Pentti Kouri, an economist and VC who was a member of the Nokia board in the 1980s. He has taught at Stanford, Yale, New York University, and HelsinkiUniversity, and worked as an economist at the International Monetary Fund. He is currently based in New York.

United States

In the United States, hakia received funding from Alexandra Investment Management, an investment advisory firm that manages a global hedge fund. Also from the U.S., former Senator and New York Knicks basketball player Bill Bradley has joined the companyâ€™s board, along with Dr. Kouri, Mr. Vargi, Anuj Mathur of Alexandra Investment Management, and hakia CEO Riza Berkan.

Hakia was on of the first alternative search engine to make the home page of web 2.0 Innovations in the past yearâ€¦ http://web2innovations.com/hakia.com.php

Hakia.com is the 3rd Semantic App being featured by Web2Innovations in its series of planned publications [Â ] where we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

The purpose of these publications is to discover and showcase todayâ€™s Semantic Web Apps and projects. Weâ€™re not going to rank them, because there is no way to rank these apps at this time – many are still in alpha and private beta.

Via

[ http://www.hakia.com/ ]
[ http://blog.hakia.com/ ]
[ http://www.hakia.com/about.html ]
[ http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php ]
[ http://www.readwriteweb.com/archives/hakia_meaning-based_search.php ]
[ http://siteanalytics.compete.com/hakia.com/?metric=uv ]
[ http://www.internetoutsider.com/2007/07/the-big-problem.html ]
[ http://www.quantcast.com/search/hakia.com ]
[ http://www.redherring.com/Home/19789 ]
[ http://web2innovations.com/hakia.com.php ]
[ http://www.pandia.com/sew/507-hakia.html ]
[ http://www.searchenginejournal.com/hakias-semantic-search-the-answer-to-poor-keyword-based-relevancy/5246/ ]
[ http://arstechnica.com/articles/culture/hakia-semantic-search-set-to-music.ars ]
[ http://www.news.com/8301-10784_3-9800141-7.html ]
[ http://searchforbettersearch.com/ ]
[ https://web2innovations.com/money/2007/12/01/is-google-trying-to-become-a-social-search-engine/ ]
[ http://www.web2summit.com/cs/web2006/view/e_spkr/3008 ]
Â

Amidzad, Foundation Capital, Peter Thiel, Search Engines, Semantic Apps, Semantic Web, Software, Technology, The Founder's Fund, Web 2.0

Powerset â€“ the natural language processing search engine empowered by Hbase in Hadoop

December 4, 2007 Web 2.0 Innovations 2 Comments

In our planned series of publications about theÂ Semantic Web and its apps today Powerset is going to be our second company, after Freebase,Â to be featured.Â

Powerset is a Silicon Valley based company building a transformative consumer search engine based on natural language processing. Their unique innovations in search are rooted in breakthrough technologies that take advantage of the structure and nuances of natural language. Using these advanced techniques, Powerset is building a large-scale search engine that breaks the confines of keyword search. By making search more natural and intuitive, Powerset is fundamentally changing how we search the web, and delivering higher quality results.

Powersetâ€™s search engine is currently under development and is closed for the general public. You can always keep an eye on them in order to learn more information about their technology and approach.

Despite all the press attention Powerset is gaining there are too few details publicly available for the search engine. In fact Powerset is lately one of the most buzzed companies in the Silicon Valley, for good or bad.

Power set is a term from the mathematics and means a set S, the power set (or powerset) of S, written P(S) P(S), or 2S, is the set of all subsets of S. In axiomatic set theory (as developed e.g. in the ZFC axioms), the existence of the power set of any set is postulated by the axiom of power set. Any subset F of P(S), is called a family of sets over S.

From the latest information publicly available for Powerset we learn that, just like some other start-up search engines, they are also using Hbase in Hadoop environment to process vast amounts of data.

It also appears that Powerset relies on a number of proprietary technologies such as the XLE, licensed from PARC, ranking algorithms, and the ever-important onomasticon (a list of proper nouns naming persons or places).

Â Â

For any other component, Powerset tries to use open source software whenever available. One of the unsung heroes that form the foundation for all of these components is the ability to process insane amounts of data. This is especially true for a Natural Language search engine. A typical keyword search engine will gather hundreds of terabytes of raw data to index the Web. Then, that raw data is analyzed to create a similar amount of secondary data, which is used to rank search results. Since Powersetâ€™s technology creates a massive amount of secondary data through its deep language analysis, Powerset will be generating far more data than a typical search engine, eventually ranging up to petabytes of data.
Powerset has already benefited greatly from the use of Hadoop: their index build process is entirely based on a Hadoop cluster running the Hadoop Distributed File System (HDFS) and makes use of Hadoopâ€™s map/reduce features.

In fact Google also uses a number of well-known components to fulfill their enormous data processing needs: a distributed file system (GFS) ( http://labs.google.com/papers/gfs.html ), Map/Reduce ( http://labs.google.com/papers/mapreduce.html ), and BigTable ( http://labs.google.com/papers/bigtable.html ).

Hbase is actually the open-source equivalent of Googleâ€™s Bigtable, which, as far as we understand the matter is a great technological achievement of the guys behind Powerset. Both JimKellerman and Michael Stack are from Powerset and are the initial contributors of Hbase.

Hbase could be the panacea for Powerset in scaling their index up to Googleâ€™s level, yet coping Googleâ€™s approach is perhaps not the right direction for a small technological company like Powerset.Â We wonder if Cuill, yet another start-up search engine thatâ€™s claiming to have invented a technology for cheaper and faster indexation than Googleâ€™s, has built their architecture upon the Hbase/Hadoop environment.Â Cuill claims that their indexing costs will be 1/10th of Googleâ€™s, based on new search architectures and relevance methods. If it is true what would the Powerset costs then be considering the fact that Powerset is probably having higher indexing costs even compared to Google, because it does a deep contextual analysis on every sentence on every web page? Taking into consideration that Google is having more than 450,000 servers in several major data centers and Powersetâ€™s indexing and storage costs might be even higher the approach Powerset is taking might be costly business for their investors.

Unless Hbase and Hadoop are the secret answer Powerset relies on to significantly reduce the costs.Â

Hadoop is an interesting software platform that lets one easily write and run applications that process vast amounts of data.

Here’s what makes Hadoop especially useful:

Scalable: Hadoop can reliably store and process petabytes.
Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.
Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.
Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch.

Hbaseâ€™s background

Google’sÂ Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.Â Just as Bigtable leverages the distributed data storage provided by the Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop. Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key). Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column given a specific row key.

Reach

According to Quantcast, Powerset is basically not popular site and is reaching less than 20,000 unique visitors per month, around 10,000 Americans. Compete is reporting the same – slightly more than 20,000 uniques per month. Considering the fact the search engine is still in its alpha stage these numbers are not that bad.

The People

Powerset has assembled a star team of talented engineers, researchers, product innovators and entrepreneurs to realize an ambitious vision for the future of search. Our team comprises industry leaders from a diverse set of companies including: Altavista, Apple, Ask.com, BBN, Digital, IDEO, IBM, Microsoft, NASA, PARC, Promptu, SRI, Tellme, Whizbang! Labs, and Yahoo!.

Founders of Powerset are Barney Pell and Lorenzo Thione and the company is actually headquartered in San Francisco. Recently Barney Pell has stepped down from the CEO spot and is now the companyâ€™s CTO.

Barney Pell, Ph.D. (CTO) For over 15 years Barney Pell (Ph.D. Computer science, Cambridge University 1993) has pursued groundbreaking technical and commercial innovation in A.I. and Natural Language understanding at research institutions including NASA, SRI, Stanford University and Cambridge University. In startup companies, Dr. Pell was Chief Strategist and VP of Business Development at StockMaster.com (acquired by Red Herring in March, 2000) and later had the same role at Whizbang! Labs. Just prior to Powerset, Pell was an Entrepreneur in Residence at Mayfield, one of the top VC firms in Silicon Valley.

Lorenzo Thione (Product Architect) Mr. Thione brings to Powerset years of research experience in computational linguistics and search from Research Scientist positions at the CommerceNet consortium and the Fuji-Xerox Palo Alto Laboratory. His main research focus has been discourse parsing and document analysis, automatic summarization, question answering and natural language search, and information retrieval. He has co-authored publications in the field of computational linguistics and is a named inventor on 13 worldwide patent applications spanning the fields of computational linguistics, mobile user interfaces, search and information retrieval, speech technology, security and distributed computing. A native of Milan, Italy, Mr. Thione holds a Masters in Software Engineering from the University of Texas at Austin.

Board of Directors

Aside Barney Pell, who is also serving on the companyâ€™s board of directors, other board members are:

Charles Moldow (BOD) is a general partner at Foundation Capital. He joined Foundation on the heels of successfully building two companies from early start-up through greater than $100 million in sales. Most notably, Charles led Tellme Networks in raising one of the largest private financing rounds in the country post Internet bubble, adding $125 million in cash to the company balance sheet during tough market conditions in August, 2000. Prior to Tellme, Charles was a member of the founding team of Internet access provider @Home Network. In 1998, Charles assisted in the $7 billion acquisition of Excite Network. After the merger, Charles became General Manager of Matchlogic, the $80 million division focused on interactive advertising.

Peter Thiel (BOD) is a partner at Founders Fund VC Firm in San Francisco. In 1998, Peter co-founded PayPal and served as its Chairman and CEO until the companyâ€™s sale to eBay in October 2002 for $1.5 billion. Peterâ€™s experience in finance includes managing a successful hedge fund, trading derivatives at CS Financial Products, and practicing securities law at Sullivan & Cromwell. Peter received his BA in Philosophy and his JD from Stanford.

Investors

In June 2007 Powerset has raised $12.5M in series A round of funding from Foundation Capital and The Founder’s Fund. Early investors include Eric Tilenius and Peter Thiel, who is also early investor in Facebook.com. Other early investors are as follows:

CommerceNet is an entrepreneurial research institute focused on making the world a better place by fulfilling the promise of the Internet. CommerceNet invests in exceptional people with bold ideas, freeing them to pursue visions outside the comfort zone of research labs and venture funds and share in their success.

Dr. Tenenbaum is a world-renowned Internet commerce pioneer and visionary. He was founder and CEO of Enterprise Integration Technologies, the first company to conduct a commercial Internet transaction (1992), secure Web transaction (1993) and Internet auction (1993). In 1994, he founded CommerceNet to accelerate business use of the Internet. In 1997, he co-founded Veo Systems, the company that pioneered the use of XML for automating business-to-business transactions. Dr. Tenenbaum joined Commerce One in January 1999, when it acquired Veo Systems. As Chief Scientist, he was instrumental in shaping the companyâ€™s business and technology strategies for the Global Trading Web. Earlier in his career, Dr. Tenenbaum was a prominent AI researcher, and led AI research groups at SRI International and Schlumberger Ltd. Dr. Tenenbaum is a Fellow and former board member of the American Association for Artificial Intelligence, and a former Consulting Professor of Computer Science at Stanford. He currently serves as an officer and director of Webify Solutions and Medstory Inc., and is a Consulting Professor of Information Technology at Carnegie Mellonâ€™s new West Coast campus. Dr. Tenenbaum holds B.S. and M.S. degrees in Electrical Engineering from MIT, and a Ph.D. from Stanford.Â

Allan Schiffman was CTO and founder of Terisa Systems, a pioneer in communications security Technology to the Web software industry. Earlier, Mr. Schiffman was Chief Technology Officer at Enterprise Integration Technologies, a pioneer in the development of key security protocols for electronic commerce over the Internet. In these roles, Mr. Schiffman has raised industry awareness of role for security and public key cryptography in ecommerce by giving more than thirty public lectures and tutorials. Mr. Schiffman was also a member of the team that designed the Secure Electronic Transactions (SET) payment card protocol commissioned by MasterCard and Visa. Mr. Schiffman co-designed the first security protocol for the Web, the Secure HyperText Transfer Protocol (S-HTTP). Mr. Schiffman led the development of the first secure Web browser, Secure Mosaic, which was fielded to CommerceNet members for ecommerce trials in 1994. Earlier in his career, Mr. Schiffman led the development of a family of high-performance Smalltalk implementations that gained both academic recognition and commercial success. These systems included several innovations widely adopted by other object-oriented language implementers, such as the â€œjust-in-time compilationâ€ technique universally used by current Java virtual machines. Mr. Schiffman holds an M.S. in Computer Science from Stanford University.

Rob Rodin is the Chairman and CEO of RDN Group; strategic advisors focused on corporate transitions, customer interface, sales and marketing, distribution and supply chain management. Additionally, he serves as Vice Chairman, Executive Director and Chairman of the Investment Committee of CommerceNet which researches and funds open platform, interoperable business services to advance commerce. Prior to these positions, Mr. Rodin served as CEO and President of Marshall Industries, where he engineered the reinvention of the company, turning a conventionally successful $500 million distributor into a web enabled $2 billion global competitor. â€œFree, Perfect and Now: Connecting to the Three Insatiable Customer Demandsâ€, Mr. Rodinâ€™s bestselling book, chronicles the radical transformation of Marshall Industries.Â

The Founders Fund â€“ The Founders Fund, L.P. is a San Francisco-based venture capital fund that focuses primarily on early-stage, high-growth investment opportunities in the technology sector. The Fundâ€™s management team is composed of investors and entrepreneurs with relevant expertise in venture capital, finance, and Internet technology. Members of the management team previously led PayPal, Inc. through several rounds of private financing, a private merger, an initial public offering, a secondary offering, and its eventual sale to eBay, Inc. The Founders Fund possesses the four key attributes that well-position it for success: access to elite research universities, contact to entrepreneurs, operational and financial expertise, and the ability to pick winners. Currently, the Founders Fund is invested in over 20 companies, including Facebook, Ironport, Koders, Engage, and the newly-acquired CipherTrust.Â

Amidzad â€“ Amidzad is a seed and early-stage venture capital firm focused on investing in emerging growth companies on the West Coast, with over 50 years of combined entrepreneurial experience in building profitable, global enterprises from the ground up and over 25 years of combined investing experience in successful information technology and life science companies. Over the years, Amidzad has assembled a world-class network of serial entrepreneurs, strategic investors, and industry leaders who actively assist portfolio companies as Entrepreneur Partners and Advisors.Amidzad has invested in companies like Danger, BIX, Songbird, Melodis, Freewebs, Agitar, Affinity Circles, Litescape and Picaboo.

Eric Tilenius brings a two-decade track record that combines venture capital, startup, and industry-leading technology company experience. Eric has made over a dozen investments in early-stage technology, internet, and consumer start-ups around the globe through his investment firm, Tilenius Ventures. Prior to forming Tilenius Ventures, Eric was CEO of Answers Corporation (NASDAQ: ANSW), which runs Answers.com, one of the leading information sites on the internet. He previously was an entrepreneur-in-residence at venture firm Mayfield. Prior to Mayfield, Eric was co-founder, CEO, and Chairman of Netcentives Inc., a leading loyalty, direct, and promotional internet marketing firm. Eric holds an MBA from the Stanford University Graduate School of Business, where he graduated as an Arjay Miller scholar, and an undergraduate degree in economics, summa cum laude, from Princeton University.

Esther Dyson does business as EDventure, the reclaimed name of the company she owned for 20-odd years before selling it to CNET Networks in 2004. Her primary activity is investing in start-ups and guiding many of them as a board member. Her board seats include Boxbe, CVO Group (Hungary), Eventful.com, Evernote, IBS Group (Russia, advisory board), Meetup, Midentity (UK), NewspaperDirect, Voxiva, Yandex (Russia)â€¦ and WPP Group (not a start-up). Some of her other past IT investments include Flickr and Del.icio.us (sold to Yahoo!), BrightMail (sold to Symantec), Medstory (sold to Microsoft), Orbitz (sold to Cendant and later re-IPOed). Her current holdings include ActiveWeave, BlogAds, ChoiceStream, Democracy Machine, Dotomi, Linkstorm, Ovusoft, Plazes, Powerset, Resilient, Tacit, Technorati, Visible Path, Vizu.com and Zedo. On the non-profit side, Dyson sits on the boards of the Eurasia Foundation, the National Endowment for Democracy, the Santa Fe Institute and the Sunlight Foundation. She also blogs occasionally for the Huffington Post, as Release 0.9.

Adrian Weller â€“ Adrian graduated in 1991 with first class honours in mathematics from Trinity College, Cambridge, where he met Barney. He moved to NY, ran Goldman Sachsâ€™ US Treasury options trading desk and then joined the fixed income arbitrage trading group at Salomon Brothers. He went on to run US and European interest rate trading at Citadel Investment Group in Chicago and London. Recently, Adrian has been traveling, studying and managing private investments. He resides in Dublin with his wife, Laura and baby daughter, Rachel.

Azeem Azhar â€“ Azeem is currently a technology executive focussed on corporate innovation at a large multinational. He began his career as a technology writer, first at The Guardian and then The Economist . While at The Economist, he launched Economist.com. Since then, he has been involved with several internet and technology businesses including launching BBC Online and founding esouk.com, an incubator. He was Chief Marketing Officer for Albert-Inc, a Swiss AI/natural language processing search company and UK MD of 20six, a blogging service. He has advised several internet start-ups including Mondus, Uvine and Planet Out Partners, where he sat on the board. He has a degree in Philosophy, Politics and Economics from Oxford University. He currently sits on the board of Inuk Networks, which operates a IPTV broadcast platform. Azeem lives in London with his wife and son.

Todd Parker â€“ Since 2002, Mr. Parker has been a Managing Director at Hidden River, LLC, a firm specializing in Mergers and Acquisitions consulting services to the wireless and communications industry. Previously and from 2000 to 2002, Mr. Parker was the founder and CEO of HR One, a human resources solutions provider and software company. Mr. Parker has also held senior executive and general manager positions with AirTouch Corporation where he managed over 15 corporate transactions and joint venture formations with a total value of over $6 billion. Prior to AirTouch, Mr. Parker worked for Arthur D. Littleas a consultant. Mr. Parker earned a BS from Babson College in Entrepreneurial Studies and Communications.

Powerset.com is the 2nd Semantic App being featured by Web2Innovations in its series of planned publications where we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

Via

[ http://www.powerset.com ]
[ http://www.powerset.com/about ]
[ http://en.wikipedia.org/wiki/Power_set ]
[ http://en.wikipedia.org/wiki/Powerset ]
[ http://blog.powerset.com/ ]
[ http://lucene.apache.org/hadoop/index.html ]
[ http://wiki.apache.org/lucene-hadoop/Hbase ]
[ http://blog.powerset.com/2007/10/16/powerset-empowered-by-hadoop ]
[ http://www.techcrunch.com/2007/09/04/cuill-super-stealth-search-engine-google-has-definitely-noticed/ ]
[ http://www.barneypell.com/ ]
[ http://valleywag.com/tech/rumormonger/hanky+panky-ousts-pell-as-powerset-ceo-318396.php ]
[ http://www.crunchbase.com/company/powerset ]

Business, Google, Search Engines, Semantic Apps, Semantic Web, Social Networking, Social Search Engines, Technology, Web 2.0

Is Google trying to become a Social Search Engine

December 1, 2007 Web 2.0 Innovations 2 Comments

Based on what we are seeing the answer is close to yes. Google is now experimenting with new social features aimed at improving the usersâ€™ search experience.

This experiment lets you influence your search experience by adding, moving, and removing search results. When you search for the same keywords again, you’ll continue to see those changes. If you later want to revert your changes, you can undo any modifications you’ve made. Note that Google claims this is an experimental feature and may be available for only a few weeks.

There seems to be features like â€œLike itâ€, â€œDonâ€™t like it?â€ and â€œKnow of a better web pageâ€. Of course, to get full advantage of these extras as well as to have your recommendations associated with your searches later, upon your return, you have to be signed in.

There is nothing new here, many of the smaller social search engines are deploying and using some of the features Google is just now trying to test, but having more than 500 million unique visitors per month, the vast majority of which are heavily using Googleâ€™s search engine, is a huge advantage if one wants to implement social elements in finding the information on web easily. Even Marissa Mayer, Googleâ€™s leading executive in search, said in August that Google would be well positioned to compete in social search. Actually with that experiment in particular it appears your vote only applies to what Google search results you will see, so it is hard to call it â€œsocialâ€ at this time around. This may prove valuable as a stand-alone service. Also, Daniel Russell of Google, some time ago, made it pretty clear that they use user behavior to affect search results. Effectively, thatâ€™s using implicit voting, rather than explicit voting.

We think, however, the only reason Google is trying to deal with these social features, relying on humans to determine the relevancy, is their inability to effectively fight the spam their SERPs are flooded with.Â

Manipulating algorithmic based results, in one way or another is in our understanding not much harder than what you would eventually be able to do to manipulate or influence results in Google that rely and depend on social recommendations. Look at Digg for example.

We think employing humans to determine which results are best is basically an effective pathway to corruption, which is sort of worse than to have an algorithm to blame for the spam and low quality of the results. Again take a look at Digg, dmoz.org and mostly Wikipedia. Wikipedia, once a good idea, became a battle field for corporate, brand, political and social wars. Being said that, we think the problem of Google with the spam results lies down to the way how they reach to the information or more concrete the methods they use to crawl and index the vast Web. Oppositely, having people, instead of robots, gathering the quality and important information (from everyoneâ€™s point of view) from around the web is in our understanding much better and effective approach rather than having all the spam results loaded on the servers and then let the people sort them out.

Thatâ€™s not the first time Google is trying new features with their search results. We remember searchmash.com. Searchmash.com is yet another of the Googleâ€™s toys in the search arena, which was quietly started out a year ago because Google did not want the public to know about this project and influence their beta testers (read: the common users) with the brand name Google. The project, however, quickly became poplar since many people discovered who the actual owner of the beta project is.

Google is under no doubt getting all the press attention they need, no matter what they do and sometimes even more than what they do actually need from. On the other hand things seem to be slowly changing today and influential media like New York Times, Newsweek, CNN and many others are in a quest for the next search engine, the next Google. This was simply impossible to happen during 2001, 2002 up to 2004, period characterized with a solid media comfort for Googleâ€™s search engine business. Â

So, is Google the first one to experiment with social search approaches, features, methods and extras? No, definitely not as you are going to see for yourself from the companies and projects listed below.

As for crediting a Digg-like system with the idea of sorting content out based on community voting, they definitely werenâ€™t the first. The earliest implementation of this we are aware of is Kuro5hin.org (http://en.wikipedia.org/wiki/Kuro5hin), which, we think, was founded back in 1999.

Eurekster

One of the first and oldest companies coined social search engines on Web is Eureskter.Â
Eurekster launched its community-powered social search platform â€œswickiâ€, as far as we know, in 2004, and explicit voting functionality in 2006. To date, over 100,000 swickis have been built, each serving a community of users passionate about a specific topic. Eurekster processes over 25,000,000 searches a month. The key to Eureksterâ€™s success in improving relevancy here has been leveraging the explicit (and implicit) user behavior though at the group or community level, not individual or general. On the other hand Eurekster never made it to the mainstream users and somehow the company slowly faded away, lost the momentum.

Wikia Social Search

Wikia was founded by Jimmy Wales (Wikipediaâ€™s founder) and Angela Beesley in 2004. The company is incorporated in Delaware. Gil Penchina became Wikia’s CEO in June 2006, at the same time the company moved its headquarters from St. Petersburg, Florida, to Menlo Park and later to San Mateo in California. Wikia has offices in San Mateo and New York in the US, and in PoznaÅ„ in Poland. Remote staff is also located in Chile, England, Germany, Japan, Taiwan, and also in other locations in Poland and the US. Wikia has received two rounds of investment; in March 2006 from Bessemer Venture Partners and in December 2006 from Amazon.com.

According to the Wikia Search the future of Internet Search must be based on:

Transparency – Openness in how the systems and algorithms operate, both in the form of open source licenses and open content + APIs.
Community – Everyone is able to contribute in some way (as individuals or entire organizations), strong social and community focus.
Quality – Significantly improve the relevancy and accuracy of search results and the searching experience.
Privacy – Must be protected, do not store or transmit any identifying data.

Other active areas of focus include:

Social Lab – sources for URL social reputation, experiments in wiki-style social ranking.
Distributed Lab – projects focused on distributed computing, crawling, and indexing. Grub!
Semantic Lab – Natural Language Processing, Text Categorization.
Standards Lab – formats and protocols to build interoperable search technologies.

Based on who Jimmy Wales is and the success he achieved with Wikipedia therefore the resources he might have access to, Wikia Search stands at good chances to survive against any serious competition by Google.

NosyJoe.com

NosyJoe is yet another great example of social search engine that employs intelligent tagging technologies and runs on a semantic platform.

NosyJoe is a social search engine that relies on you to sniff for and submit the web’s interesting content and offers basically meaningful search results in the form of readable complete sentences and smart tags. NosyJoe is built upon the fundamental belief people are better than robots in finding the interesting, important and quality content around Web. Rather than crawling the entire Web building a massive index of information, which aside being an enormous technological task, requires huge amount of resources and is time consuming process would also load lots of unnecessary information people don’t want, NosyJoe is focused just on those parts of the Web people think are important and find interesting enough to submit and share with others.

NosyJoe is a hybrid of a social search engine that relies on you to sniff for and submit the web’s interesting content, an intelligent content tagging engine on the back end and a basic semantic platform on its web visible part. NosyJoe then applies a semantic based textual analysis and intelligently extracts the meaningful structures like sentences, phrases, words and names from the content in order to make it just one idea more meaningfully searchable. This helps us present the search results in basically meaningful formats like readable complete sentences and smart phrasal, word and name tags.

The information is then clustered and published across the NosyJoe’s platform into contextual channels, time and source categories and semantic phrasal, name and word tags are also applied to meaningfully connect them together, which makes even the smallest content component web visible, indexable and findable. At the end a set of algorithms and user patterns are applied to further rank, organize and share the information.

From our quick tests on the site the search results returned were presented in form of meaningful sentences and semantic phrasal tags (as an option), which turns their search results into — something we have never seen on web so far — neat content components, readable and easily understandable sentences, unlike what we are all used to, some excerpts from the content where the keyword is found in. When compared to other search enginesâ€™ results NosyJoe.comâ€™s SERPs appear truly meaningful.

As of today, and just 6 or 7 months since they went online, NosyJoe is already having more than 500,000 semantic tags created that connect tens of thousands of meaningful sentences across their platform.

We have no information as to who stays behind NosyJoe but the project seems very serious and promising in many aspects from how they gather the information to how they present the results to the way they offset low quality results. From all newcomers social search engines NosyJoe stands at best changes to make it. As far as we know NosyJoe is also based in the Silicon Valley.Â

Sproose

Sproose says it is developing search technology that lets users obtain personalized results, which can be shared among a social network, using the Nutch open-source search engine, and building applications on top. Their search appears to using third party search feeds and ranks the results based on the users’ votes.

Sproose is said it has raised nearly $1 million in seed funding. It is based in Danville, a town on the east side of the SF Bay Area. Sproose said Roger Smith, founder, former president and chief executive at Silicon Valley Bank, was one of the angel investors, and is joining Sproose’s board.

Other start-up search engines of great variety are listed below:

Hakia â€“ Relies on natural language processing. These guys are also experimenting with social elements with the feature so called “meet others who asked the same query“.
Quintura â€“ A visual engine based today in Virginia, US. The company is founded by Russians and has early been headquartered in Moscow.Â
Mahalo – search engine that looks more like a directory with quality content handpicked by editors. Jason Calacanis is the founder of the company.
ChaCha â€“ Real humans try to help you in your quest for information, via chat. The company is based in Indiana and has been criticized a lot by the Silicon Valleyâ€™s IT community. Despite these critics they have recently raised $10m in Series A round of funding.Â
Powerset â€“ Still in closed beta and also relying on understanding the natural language. See our Powerset review. Â
Clusty – founded in 2000 by three Carnegie Mellon University scientists.
Lexxe – Sydney based engine featuring natural language processing technologies.
Accoona â€“ The company has recently filed for an IPO in US planning to raise $80M from the public.
Squidoo â€“ It has been started in October 2005 by Seth Godin and looks more like a wiki site, ala Wikia or Wikipedia where anyone creates articles on different topics.
Spock â€“ Focuses on people information, people search engine.

One thing is for sure today; Google is now bringing solid credentials to and is somehow legitimating the social search approach, which by the way is helping those so many smaller so-called social search engines.Â

Perhaps it is about time for consolidation in the social search sector. Some of the smaller but more promising social search engines can now become one in order to be able to compete with and prevent Googleâ€™s dominance within the social search sector too, just like what they did with the algorithmic search engines. Is Google also interested in? Anyone heard of recent interest in or already closed acquisition deals for start-up social search engines?

On the contrary, more and more IT experts, evangelists and web professionals agree on the fact that taking Google down is a challenge that will most likely be accomplished by a concept that is anything else but not a search engine in our traditional understanding. Such concepts, including but not limited to, are Wikipedia, Del.icio.us and LinkedWords. In other words finding information on web doesn’t necessarily mean to search for it.

Via:
[ http://www.google.com/experimental/a840e102.html ]
[ http://www.blueverse.com/2007/12/01/google-the-social-…]
[ http://www.adesblog.com/2007/11/30/google-experimenting-social…Â ]
[ http://www.techcrunch.com/2007/11/28/straight-out-of-left-field-google-experimenting-with-digg-style-voting-on-search-results ]
[ http://www.blogforward.com/money/2007/11/29/google… ]
[ http://nextnetnews.blogspot.com/2007/09/is-nosyjoecom-… ]
[ http://www.newsweek.com/id/62254/page/1 ]
[ http://altsearchengines.com/2007/10/05/the-top-10-stealth-… ]
[ http://www.nytimes.com/2007/06/24/business/yourmoney/…Â Â ]
[ http://dondodge.typepad.com/the_next_big_thing/2007/05… ]
[ http://search.wikia.com/wiki/Search_Wikia ]
[ http://nosyjoe.com/about.com ]
[ http://www.siliconbeat.com/entries/2005/11/08/sproose_up_your…Â ]
[ http://nextnetnews.blogspot.com/2007/10/quest-for-3rd-generation… ]
[ http://www.sproose.com ]

Semantic Apps, Semantic Web, Technology, Web 2.0

Freebase: open, shared database of the world’s knowledge

November 30, 2007 Web 2.0 Innovations 3 Comments

The Company behind

Metaweb Technologies, Inc. is a company based in San Francisco that is developing Metaweb, a semantic data storage infrastructure for the web, and its first application built on that platform named Freebase, described as an “open, shared database of the world’s knowledge”. The company was founded by Danny Hillis and others as a spinoff of Applied Minds in July, 2005, and operated in stealth mode until 2007.

Reach

According to Quantcast, which we believe is very accurate, Freebase is basically not popular site,Â despite the press attention it gets,Â and is reaching less than 5000 unique visitors per month. Compete is reporting for slightly more than 8000 uniques per month.

The People

William Daniel “Danny” Hillis (born September 25, 1956, in Baltimore, Maryland) is an American inventor, entrepreneur, and author. He co-founded Thinking Machines Corporation, a company that developed the Connection Machine, a parallel supercomputer designed by Hillis at MIT. He is also co-founder of the Long Now Foundation, Applied Minds, Metaweb Technologies, and author of The Pattern on the Stone: The Simple Ideas That Make Computers Work.

Investors

In March 2006, Freebase received $15 million in funding from investors including Benchmark Capital, Millennium Technology Ventures and Omidyar Network.

Freebase is in alpha.

Freebase.com is the first Semantic App being featured by Web2Innovations in its series of planned publications where we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

[ http://freebase.com ]
[ http://roblog.freebase.com ]
[ http://www.crunchbase.com/company/freebase ]
[ http://www.readwriteweb.com/archives/10_semantic_apps_to_watch.php ]
[ http://en.wikipedia.org/wiki/Danny_Hillis ]
[ http://www.metaweb.com ]
[ http://en.wikipedia.org/wiki/Metaweb_Technologies ]

Semantic Apps, Semantic Web, Technology, Web 2.0

The Semantic Web and its Applications Today

November 29, 2007 Web 2.0 Innovations 3 Comments

Since Web 2.0 Innovations is all about discovering and showcasing the innovation on web, we think the Semantic Web is playing a significant role of the next web transformation. Therefore in a series of publications we will try to discover, highlight and feature the next generation of web-based semantic applications, engines, platforms, mash-ups, machines, products, services, mixtures, parsers, and approaches and far beyond.

We are not going to try to explain in details what Semantic Web is after all. There has been plenty of information on web as to what does really that term mean. First off it is the Tim Berners-Lee W3C led initiative that touts technologies like RDF, OWL and other standards for metadata. Basically it promises to change how the web works in first place, to meaningfully connect the different datasets around web in a readable and usable format for both humans and robots.

The Semantic Web is a web of data. There is lots of data we all use every day, and itâ€™s not part of the web. One can see his/her bank statements or travel arrangements on the web, and the photographs and one can see his/her appointments in a calendar. But can one see his/her photos in a calendar to see what one was doing when she or him took them? Can one see the bank statement lines in a calendar?

Why not? Because we don’t have a web of data. Because data is controlled by applications, and each application keeps it to itself.

The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

Some of the very basics Semantic components include:

— XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within.

— XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents.

— RDF is a simple language for expressing data models, which refer to objects (“resources”) and their relationships. An RDF-based model can be represented in XML syntax.

— RDF Schema is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.

— OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjoint ness), cardinality (e.g. “exactly one”), equality, richer typing of properties and characteristics of properties (e.g. symmetry), and enumerated classes.

— SPARQL is a protocol and query language for semantic web data sources.

Here are some more links to plenty of resources that can get you to the basics and fundaments of the Semantic Web.

http://en.wikipedia.org/wiki/Semantic_Web

http://www.w3.org/2001/sw/

http://www.w3schools.com/semweb/default.asp

http://infomesh.net/2001/swintro/

http://www.w3.org/2000/10/swap/Primer (Getting Into RDF & Semantic Web Using N3)

http://www.w3.org/DesignIssues/Semantic (Semantic Web Roadmap)

http://purl.org/swag/whatIsSW (What Is The Semantic Web?)

http://uwimp.com/eo.htm (Semantic Web Primer)

http://logicerror.com/semanticWeb-long (Semantic Web Introduction – Long)

http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html (SciAm: The Semantic Web)

http://www.xml.com/pub/a/2001/03/07/buildingsw.html (Building The Semantic Web)

http://infomesh.net/2001/06/swform/ (The Semantic Web, Taking Form)

http://www.w3.org/2001/sw/Activity (SW Activity Statement)

http://www.w3.org/2000/01/sw/ (SWAD)

If by any chance you are working for or being a part of, or just know about, a company or a team deploying Semantic Web in one way or another drop us a short note at info [at] web2innovations.com and we would love to feature your work here.

Web 2.0 Money

Category Archives: Semantic Web

LinkedWords.com – the consolidated traffic for the entire 2008 is expected to be in the 10 Million range

A new way to build your Google Maps

Massive second round of funding for Freebase – $42 Million

Google files patent for recognizing text in images

Hakia takes $5M more, totals $16M

Powerset â€“ the natural language processing search engine empowered by Hbase in Hadoop

Freebase: open, shared database of the world’s knowledge

The Semantic Web and its Applications Today

The Money & Business Behind the Web 2.0 Innovations