Feeds:
Posts
Comments

Archive for the ‘Big Data’ Category

Blunt Tools

While data is a game-changer, it’s also important to remember that half-assed data tools are not better than no data tools.  A good example are the horrible software algorithms used to sort through resumes over the last decade:

Algorithms and big data are powerful tools. Wisely used, they can help match the right people with the right jobs. But they must be designed and used by humans, so they can go horribly wrong. Peter Cappelli of the University of Pennsylvania’s Wharton School of Business recalls a case where the software rejected every one of many good applicants for a job because the firm in question had specified that they must have held a particular job title—one that existed at no other company.

Read Full Post »

On Thursday, GSK announced an effort to release to open up its clinical trial data sets to outside researchers, perhaps leading the pharma industry to join the Kaggle trend, in freeing data to get the larger community of smart people to mine innovative truths from the data.

GSK announced:

GSK is fully committed to sharing information about its clinical trials. It posts summary information about each trial it begins and shares the summary results of all of its clinical trials – whether positive or negative – on a website accessible to all. Today this website includes almost 4,500 clinical trial result summaries and receives an average of almost 10,000 visitors each month. The company has also committed to seek publication of the results of all of its clinical trials that evaluate its medicines – regardless of what the results say – to peer-reviewed scientific journals.

Expanding further on its commitments to openness and transparency, GSK also announced today that the company will create a system that will enable researchers to access the detailed anonymised patient-level data that sit behind the results of clinical trials of its approved medicines and discontinued investigational medicines. To ensure that this information will be used for valid scientific endeavour, researchers will submit requests which will be reviewed for scientific merit by an independent panel of experts and, where approved, access will be granted via a secure web site. This will enable researchers to examine the data more closely or to combine data from different studies in order to conduct further research, to learn more about how medicines work in different patient populations and to help optimise the use of medicines with the aim of improving patient care.

This initiative is a step towards the ultimate aim of the clinical research community developing a broader system where researchers will be able to access data from clinical trials conducted by different sponsors. GSK hopes the experience gained through this initiative will be of value in developing and catalysing this wider approach.

Read Full Post »

Enabling More Shots on Goal

One of the metaphors I have used in connection with finding talent is that of Antonio Gates, the San Diego Chargers monster TE who never played college football.  With too narrow a view of talent, Gates may have spent his life doing something in which we football fans never got to appreciate his pass-catching talent.  As I commented then, we need to embrace breaking down limiting beliefs about the narrowness of talent that are embedded in talent recognition processes:

Despite this, the way the recruiting process often works is very conventional and it takes a narrow view of things and it can whittle unconventional candidates out of the talent pool.  My thesis is that this is causing a huge economic and human deadweight loss to our economy and society as talented folks don’t get to move around or it takes them too long to make a cross-functional move.

With Antonio Gates at the back of my mind, it was interesting this weekend to tie it to global football or soccer as we Americans know it better.  Manchester City, the reigning Premier League champions, are working with a private provider of soccer match data, to release that data to fans (tying into another favorite topic of this blog regarding freeing data sets).  The expectation is that in Kaggle-like fashion, a crowd of smart people and fans will attack that data, mining new truths out of them.  As a club official says, capturing the power of empowering the crowd through open processes:

I want our industry to find a Bill James. Bill James needs data, and whoever the Bill James of football is, he doesn’t have the data because it costs money

Read Full Post »

The textbook instance of what I have dubbed the routine rite is Facebook’s “Like” button.  Justin Rosenstein describes the deceptively simple, and consequently powerful nature of this gesture:

Following the lead of early Internet sharing services such as Delicious.com, Zuckerberg then created a button that would allow users to signal an endorsement on Facebook, and eventually on other websites, of a video, picture, article, or even a brand. Other engineers wanted to call it the “awesome” button. Zuckerberg decided to name it the “like” button. “It sounded bland and generic,” said Justin Rosenstein, an early Facebook engineer who went on to found Asana, an online collaboration tool, with Facebook co-founder Dustin Moskovitz. “I feel foolish in hindsight to have missed the genius: Facebook has managed to take concepts as basic as ‘friend,’ ‘event,’ and ‘like’ and co-opt them.”

Read Full Post »

Context and Presentation

Sometimes, especially when data is concerned, we immediately think complication – algorithms, data science, and equations.  There is a place for that, but we can outsmart ourselves.  Often, simplicity wins the marketplace.  A good reminder about data plays on bryce.vc this week:

Here’s the thing. Data, big, medium or small, has no value in and of itself. The value of data is unlocked through context and presentation.

Context and presentation, and not only the data itself, make the difference.  Without it, the data is likely not to be seen as useful to the user, even if it could be, and without being useful and causing a change in behavior, the opportunity to create value is missed.

Read Full Post »

In his post today explaining his investment in DataSift, Mark Suster explores in insightful detail the value of Twitter for content-creation and of tools like DataSift for helping users extract from that content what is useful to them. His analogy in the below excerpt is to transforming an overwhelming fire-hose blast into a manageable tap stream.

Our goal is to make the enormous volume of real-time information more manageable for the 99% of companies that lack the infrastructure to process these volumes in real time. Think of DataSift as turning the fire-hose into a cost-effective and manageable tap of running water.

To draft in the airstream of his post, I wanted to refer back to two recent posts on this blog addressing both subjects: here is a post on the value of Twitter in enabling users to create content and here is a post on the opportunity that lies in providing the filters required to make manageable the flood of content from Twitter and other content-creation sources.

Read Full Post »

In Mumbai, there are massive open-air laundromats called dhobi ghats. Somehow, clothing is picked up from a client’s home, is placed into close proximity, if not outright intermingled with the clothes of thousands of other households, but yet it makes it back to the home of the correct owner. This ability to match the right clothing, picked out from a mountain of the wrong clothing, to the right household is critical to the value created by the dhobi ghats, as returning the wrong stuff would have no value and indeed, would drain value by the cascading frustration it would cause to clients.

So what does this have to do with anything?

In my most recent post, which discussed “enablement” as an Internet business model, I linked back to a post from roughly two months ago about Internet-enabled expression as a foundation of successful business models on the Internet and as an unprecedented historic enabler of human expression. I noted that one could trace the history of the Internet through sites that made self-expression easier and easier:

I can chart a chain of  tools from when I first started using the Internet: interest-based UseNet groups, listservs, GeoCities and other “create your own website” tools, blogging tools, YouTube, Facebook and social networking sites, Twitter, Tumblr and microblogging sites, photosharing tools etc.

Coincidentally, also yesterday, Fred Wilson posted on a similar topic. He noted that using posts/day for WordPress, Tumblr, and Twitter that:

The frequency of posts in a service is inversely proportional to the size of the post. Said another way, the longer the post, the less frequently they will happen.

….

If you want to understand the power of Tumblr and Twitter, you need to look at how quick and how easy it is to post. There are of course many other factors at work, but brevity and ease is a big part of why these services work so well.

The point is simple: the easier you make it for people to express themselves by giving them a variety of simple tools to do so, the easier it is for the user to overcome inertia and “say” something, and the more total content is created.

This raises the obvious question: we already have too much content; so isn’t making it easier for users to create further content sort of pointless. Fred’s most illuminating point is in the comments to his post: ”that’s why we need filters and curation. we want more posts and more filters”

Filters to sort through the ever-increasing content and present it on a silver platter to those for whom it is most relevant is now a critically important business model and the other side of the coin of the expression thesis. Some system, like the Mumbai dhobi ghats, has to get the content to the right place in order for it to have optimal value. Ultimately, if the content is not read, it will not be created, nor will society benefit from users seeing the content most useful to them. For these reasons, critical to “building better soapboxes” is also creating “dhobi ghats” to keep advancing the enabling of human expression as we have been doing this last fifteen or so years.

Read Full Post »

Information is power, so the saying goes.  Today, decades into the digital age, a transition that enabled the collection and creation of data on a previously unknown scale, we have enormous data sets.  These sets are growing today at a monstrous pace, even relative to the digital age, with social networks, blogs and self-publishing, and the data being collected by mobile phones. We also for the first time have the ability to cheaply (“bootstrapped startup” levels of cheap) use technology to search and find patterns in these datasets, opening opportunities to creating new markets by disrupting existing markets.  An article from the FT, excerpted below, provides an overview of these issues.  I am going to dive much deeper into this as it has personal relevance for my current project.

If information is power, harnessing the increased information available provides opportunities for unprecedented value creation/disruption/redistribution.

Excerpt below (bolding added):

“While “big data” has become the buzzword, a better description would be “messy data”, says Roger Ehrenberg of IA Ventures, an early-stage investor. Harvesting, cleaning up and organising raw data in a way that it can be processed is a large part of the battle, he says.

This has been complicated further by the big growth in unstructured data – information, such as text, that is not organised in a way that a computer can easily process. With the volume of user-generated text and video growing rapidly, this has become one of the main focuses of technological development.

Chief among the new tools are natural language processing, which enables a computer to extract meaning from text, and machine learning, the feedback loops through which computers can test their conclusions on large amounts of data in order progressively to refine their results.

Subjecting large data sets to analysis has also been made easier by two of the forces that have reshaped information technology more widely: the spread of low-cost, standardised computer hardware and the emergence of open-source software.

This has created a cheap computing platform for new technologies such as Hadoop – a piece of software architecture that is designed to handle massive amounts of data. The idea was based on breakthroughs at Google, which needed to find ways to conduct large volumes of intensive web searches simultaneously. It has since been taken up by companies including Facebook and Yahoo.

The rise of cloud computing – which centralises storage and processing power in larger data centres – has also brought big data within the reach of more companies. By tapping into the cloud computing services offered by Amazon, say, a company such as Color can get instant access to all the analytical power it needs without needing to take on the fixed costs of buying its own servers, says D.J. Patil, chief product officer at the IT start-up.

For business leaders, “the big skill in future will be to ask the right question”, says Tim O’Reilly, a technology commentator and publisher.

Besides smartphones, new sources of data include social networks, blogs and other sources of user-generated content; sensors collecting everything from traffic patterns to a user’s heart rhythm; and click streams generated by people spending an increasing amount of their lives online.

Much of the information is in unstructured form. It has never been collated in a traditional relational database, where it could be queried at will. Without techniques to harvest, verify and analyse it – often in real time – valuable commercial signals are lost in the noise.

It sometimes takes the analysis of massive data sets to detect useful patterns, says Michael Olson. His California start-up, Cloudera, is commercialising the type of technology used by companies such as Facebook and Yahoo to crunch through vast bodies of information. Retailers, for instance, might learn far more from the 10 years’ worth of customer data they can now analyse in one go than from the more limited runs to which they were once restricted, he says.”

Read Full Post »

A couple of links to a U Chicago professor article and the CTO of the US Aneesh Chopra since my post that talked about government data sets.  Still much more to be done on this front.  Way too early to pat ourselves on the back especially from the perspective of federal government data.  Hopefully this means more focus on this issue because it can be entreprenurial fuel as I have discussed.

Read Full Post »

Startup America feels a little gimmicky.  Many of the obvious things that entrepreneurs or companies might want are not going to happen through Startup America because these things are not under the President’s control.  For example, one idea would be funding for entrepreneurs.  While the Administration may be able to move around funds a little bit, Congress is not going to allocate any significant additional funds given our budget situation and divided government, and in any event, many reasonable people would disagree with the idea of or need for a government venture fund.  Others such as changes in immigration laws are also a matter for Congress and the high-tech community has made its position known long before Startup America.

Here are some thoughts on how the Administration and Startup America can actually take concrete steps to stimulate entrepreneurship by providing entrepreneurs market intelligence and raw materials for entrepreneurial ideas.

First, take advantage of the reach of the federal government as an issue/trend spotting system.  The federal government reaches into every sector of our economy and every corner of our country.  Create a system to funnel up issues that government agencies are running into or trends they see.  This could be anything from difficulty in procuring a certain type of product, issues with providing IT support to employees, an explosion in data storage needs, or an increase in complaints about a certain type of business.  I’m limited by my lack of imagination, but I am sure that there are a range of funky issues that government employees and agencies run into. Chances are these will also reflect issues in the private sector.  By having a list of these trends, it creates an “X prize” type competitive atmosphere without the government having to put up prize money.  The opportunity itself is the prize.  The government’s experience itself can validate a potential market opportunity giving the entrepreneur something to work on and potential investors some data that the market is real given the market intelligence from the government’s experience.

Second, concentrate on and accelerate the release of government data sets.  This was an idea trumpeted by the Administration early on, but focus seems to have drifted and data.gov (the central repository) is not very impressive.  Data sets do two things.  One, they lend themselves to the creation of apps or other innovations to manipulate and present the data, which in themselves can be very valuable.  Two, and tied to the issue/trend spotting point, they help both entrepreneurs and investors find potential market opportunities and/or test hypotheses.

Third, there is already a lot of data on different agency websites that could empower developers to provide the data in a much more useful way to the American public.  For example, the DOJ or FTC website has critical information for lawyers and citizens regarding precedent in different areas, but it is too cumbersome to find and manipulate this information efficiently.  (It appears that some agencies like the FCC have already started doing this, which begs the question of what the excuse is that it isn’t being done more comprehensively.)  There is utility regardless of the type of government agency.  For example, imagine what types of beautiful apps would spring up if developers could access the pictures and descriptions of the Smithsonian collection that are already on the Smithsonian website.  Startup America should work with agencies to implement these APIs, because this would release a great amount of source material for innovative entrepreneurial projects.

Read Full Post »

Follow

Get every new post delivered to your Inbox.

Join 63 other followers