Sunday, January 15, 2012
Big Data
Image via CrunchBaseBig Data: Big News
Facebook And Big Data
After reading this you appreciate your Facebook stream just a little more.
O'Reilly Radar: What is big data?
Facebook And Big Data
After reading this you appreciate your Facebook stream just a little more.
O'Reilly Radar: What is big data?
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. ..... cost-effective approaches have emerged to tame the volume, velocity and variability of massive data. Within this data lie valuable patterns and information ...... Today's commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. ...... analytical use, and enabling new products ...... Being able to process every item of data in reasonable time removes the troublesome need for sampling ...... by combining a large number of signals from a user's actions and those of their friends, Facebook has been able to craft a highly personalized user experience and create a new kind of advertising business. It's no coincidence that the lion's share of ideas and tools underpinning big data have emerged from Google, Yahoo, Amazon and Facebook. ....... The emergence of big data into the enterprise brings with it a necessary counterpart: agility. Successfully exploiting the value in big data requires experimentation and exploration. ........ Input data to big data systems could be chatter from social networks, web server logs, traffic flow sensors, satellite imagery, broadcast audio streams, banking transactions, MP3s of rock music, the content of web pages, scans of government documents, GPS trails, telemetry from automobiles, financial market data, the list goes on. ....... the three Vs of volume, velocity and variety are commonly used to characterize different aspects of big data. ........ Having more data beats out having better models ...... If you could run that forecast taking into account 300 factors rather than 6, could you predict demand better? ......... Many companies already have large amounts of archived data, perhaps in the form of logs, but not the capacity to process it. ...... data warehouses or databases such as Greenplum — and Apache Hadoop-based solutions ...... Apache Hadoop.. places no conditions on the structure of the data it can process. ...... First developed and released as open source by Yahoo, it implements the MapReduce approach pioneered by Google in compiling its search indexes. Hadoop's MapReduce involves distributing a dataset among multiple servers and operating on the data: the "map" stage. The partial results are then recombined: the "reduce" stage. ......... Hadoop is not itself a database or data warehouse solution, but can act as an analytical adjunct to one. ....... A MySQL database stores the core data. This is then reflected into Hadoop, where computations occur, such as creating recommendations for you based on your friends' interests. Facebook then transfers the results back into MySQL, for use in pages served to users. ............ the increasing rate at which data flows into an organization — has followed a similar pattern to that of volume. Problems previously restricted to segments of industry are now presenting themselves in a much broader setting. Specialized companies such as financial traders have long turned systems that cope with fast moving data to their advantage. Now it's our turn. ......... Online retailers are able to compile large histories of customers' every click and interaction: not just the final sales. Those who are able to quickly utilize that information, by recommending additional purchases, for instance, gain competitive advantage. The smartphone era increases again the rate of data inflow, as consumers carry with them a streaming source of geolocated imagery and audio data. ......... The importance lies in the speed of the feedback loop, taking data from input through to decision. ........ you wouldn't cross the road if all you had was a five-minute old snapshot of traffic location. ......... "streaming data," or "complex event processing." ...... when the input data are too fast to store in their entirety: in order to keep storage requirements practical some level of analysis must occur as the data streams in. ........ At the extreme end of the scale, the Large Hadron Collider at CERN generates so much data that scientists must discard the overwhelming majority of it — hoping hard they've not thrown away anything useful. The second reason to consider streaming is where the application mandates immediate response to the data. Thanks to the rise of mobile applications and online gaming this is an increasingly common situation. ........ The velocity of a system's outputs can matter too. The tighter the feedback loop, the greater the competitive advantage. ....... Rarely does data present itself in a form perfectly ordered and ready for processing. A common theme in big data systems is that the source data is diverse, and doesn't fall into neat relational structures. It could be text from social networks, image data, a raw feed directly from a sensor source. None of these things come ready for integration into an application. .......... the reality of data is messy. Different browsers send different data, users withhold information, they may be using differing software versions or vendors to communicate with you. And you can bet that if part of the process involves a human, there will be error and inconsistency. ....... Is this city London, England, or London, Texas? By the time your business logic gets to it, you don't want to be guessing. ...... a principle of big data: when you can, keep everything. There may well be useful signals in the bits you throw away. ....... documents encoded as XML are most versatile when stored in a dedicated XML store such as MarkLogic. Social network relations are graphs by nature, and graph databases such as Neo4J make operations on them simpler and more efficient. ....... a disadvantage of the relational database is the static nature of its schemas. In an agile, exploratory environment, the results of computations will evolve with the detection and extraction of more signals. Semi-structured NoSQL databases meet this need for flexibility: they provide enough structure to organize data, but do not require the exact schema of the data before storing it. ........ three forms: software-only, as an appliance or cloud-based. ...... IT is undergoing an inversion of priorities: it's the program that needs to move, not the data. .... Financial trading systems crowd into data centers to get the fastest connection to source data, because that millisecond difference in processing time equates to competitive advantage. ...... 80% of the effort involved in dealing with data is cleaning it up in the first place ...... data science, a discipline that combines math, programming and scientific instinct. ...... The art and practice of visualizing data is becoming ever more important in bridging the human-computer gap to mediate analytical insight in a meaningful way. ...... advice to businesses starting out with big data: first, decide what problem you want to solve.
Facebook And Big Data
Image via WikipediaBig Data: Big News
ReadWriteWeb: Why Facebook's Data Sharing Matters
This also tells me Google is not the only major tech company trying to get on the Big Data train. Facebook is also well-positioned.
Curiously Yahoo's new CEO also has said he will take Yahoo into the Big Data domain. He got the vocabulary right. I hope he can deliver. Yahoo also sits on some pretty Big Data.
ReadWriteWeb: Why Facebook's Data Sharing Matters
Facebook has cut a deal with political website Politico that allows the independent site machine-access to Facebook users' messages, both public and private, when a Republican Presidential candidate is mentioned by name. The data is being collected and analyzed for sentiment by Facebook's data team, then delivered to Politico to serve as the basis of data-driven political analysis and journalism. ..... Facebook could be the biggest, most dynamic census of human opinion and interaction in history. ....... Back in the middle of the last century, when US Census data and housing mortgage loan data were both made available for computer analysis and cross referencing for the first time, early data scientists were able to prove a pattern of racial discrimination by banks against people of color who wanted to buy houses in certain neighborhoods. The data illuminated the problem and made it undeniable, thus leading to legislation to prohibit such discrimination...... the relationship between data and knowledge generally in the emerging data-rich world....... David Weinberger .. "It's not simply that there are too many brickfacts [datapoints] and not enough edifice-theories. Rather, the creation of data galaxies has led us to science that sometimes is too rich and complex for reduction into theories. As science has gotten too big to know, we've adopted different ideas about what it means to know at all." ...... The world's largest social network, rich with far more signal than any of us could wrap our heads around, could help illuminate emergent qualities of the human experience that are only visible on the network level.Google machine-reads all your Gmail emails. That is how it serves ads against them. I don't think that is a breach of privacy. I can imagine Facebook similarly machine-reading your private Facebook messages and updates. As long as individuals are not identified, that collective data is fair game. It has the potential to do tremendous good.
This also tells me Google is not the only major tech company trying to get on the Big Data train. Facebook is also well-positioned.
Curiously Yahoo's new CEO also has said he will take Yahoo into the Big Data domain. He got the vocabulary right. I hope he can deliver. Yahoo also sits on some pretty Big Data.
Mark Cuban: Contrarian On The TV Business
Image via WikipediaI love following the VCs I follow in the blogosphere, but I wish my list was more tilted towards entrepreneurs. The problem is the top entrepreneurs don't blog. Mark Cuban is an exception. He does blog. And the guy sure is opinionated.
I think Mark Cuban just told me the people who added smarts to the phone are going to have a much harder time doing the same to TV. I don't think his stand is definitive. But his stand does give me a glimpse into the complexity of the landscape. Mark Cuban of Broadcast.com fame. I remember when they got bought by Yahoo. I was doing some preliminary work on a dot com that went on to do really well, for two years.
Mark Cuban: The TV Business Keeps Getting Stronger!
(2) Video is content king. People like consuming content in video format. Much faster broadband might stand a chance but not the broadband we know. The Internet pipes just are not there yet.
(3) Ease of use is supreme. People want to be able to just turn on and watch. No browse and click.
I think Mark Cuban just told me the people who added smarts to the phone are going to have a much harder time doing the same to TV. I don't think his stand is definitive. But his stand does give me a glimpse into the complexity of the landscape. Mark Cuban of Broadcast.com fame. I remember when they got bought by Yahoo. I was doing some preliminary work on a dot com that went on to do really well, for two years.
Mark Cuban: The TV Business Keeps Getting Stronger!
We had a policy that we never tried to create hits. That we were always going to go wide and create a reason for people to start watching video online. 17 years later. Yep, its been 17 years since we started Broadcast.com (as audionet.com first), Youtube and others are still doing the exact same thing. ...... Good for them ! Except they are making one huge fundamental mistake, they are trying to create hits. They don’t like the idea that beyond a steady stream of 1 hit wonders they haven’t been able to create a sustainable roadmap to content success. In other words, they have no idea how to drive an audience to specific content. Their hits come out of nowhere. ...... viewing for cable networks has skyrocketed and the amount of traditional tv watched has continued to increase. ..... used to be that only movie companies got output deals ..... Today, TV shows are getting output deals and generating lots of revenue across all the different platforms that show TV shows. Its not just syndication,but those online distributors want to make sure they get the best shows and they are committing up front to buy those shows. An output deal. Found money. ...... The TV business isn’t dead. It really isn’t even morphing. Sure people will watch video online. They will watch it on phones. They will download it. But the videos that online distributors pay the most for will be those that have done the best on traditional TV. Which in turn means more money for the production of shows. ...... Online video is to TV today like DVDs were to Movies in the past. A great revenue source that correlated to the movie’s boxoffice. ...... having to hit the internet button on the remote, or even worse, the input button on the remote will not be the path of least resistance for watching tv. Believe it or not, it will be far too much hassle for most people when compared to just turning on and watching TV the old fashioned way. And on top of that, distributors like Dish, Directv, Charter, Comcast, etc are working hard to improve their guide experiences which will be faster and easier than their online counterparts....... last but not least, MOCA, DLNA and good old fashioned wi fi is always going to be a hassle. No one has perfect wi fi at their apartment or house. It always screws up.(1) TV shows are high quality stuff. Not just anyone can produce them. People like them.
(2) Video is content king. People like consuming content in video format. Much faster broadband might stand a chance but not the broadband we know. The Internet pipes just are not there yet.
(3) Ease of use is supreme. People want to be able to just turn on and watch. No browse and click.
Subscribe to:
Posts (Atom)