Showing posts with label BigData. Show all posts
Showing posts with label BigData. Show all posts

Saturday, February 23, 2013

Big Data, Big Confusion?

Big Data
Big Data (Photo credit: Kevin Krejci)
The Problem with Our Data Obsession
however objective data may be, interpretation is subjective, and so is our choice about which data to record in the first place. While it might seem obvious that data, no matter how “big,” cannot perfectly represent life in all its complexity, information technology produces so much information that it is easy to forget just how much is missing..... life is messy, and not everything can be abstracted into data for computers to act upon
There are obvious limitations to Big Data, but overall it is a force for good. The solution to Big Data blind spots seems to be even more Big Data. No?
Enhanced by Zemanta

Tuesday, July 31, 2012

Big Money In Big Data

Big Data, The Moving Parts: Fast Data, Big Ana...
Big Data, The Moving Parts: Fast Data, Big Analytics, and Deep Insight (Photo credit: Dion Hinchcliffe)
I do think there is big money in Big Data. A lot of people do. But here is a disagreeing thought.

Is There Big Money in Big Data?
Peter Fader says a flood of consumer data collected from mobile devices may not help marketers as much as they think. ..... Few ideas hold more sway among entrepreneurs and investors these days than "Big Data." The idea is that we are now collecting so much information about people from their online behavior and, especially, through their mobile phones that we can make increasingly specific predictions about how they will behave and what they will buy. ..... what was going on 15 years ago with CRM (customer relationship management) .... ask anyone today what comes to mind when you say "CRM," and you'll hear "frustration," "disaster," "expensive," and "out of control." It turned out to be a great big IT wild-goose chase. And I'm afraid we're heading down the same road with Big Data ..... many "big data" people don't know what they don't know. ..... the still-powerful rubric of RFM: recency, frequency, monetary value. .... Ask anyone in direct marketing about RFM, and they'll say, "Tell me something I don't know." But ask anyone in e-commerce, and they probably won't know what you're talking about. ...... Chartists are looking at the data without developing fundamental explanations for why those movements are taking place ..... Among financial academics, chartists tend to be regarded as quacks. But a lot of the Big Data people are exactly like them. They say, "We are just going to stare at the data and look for patterns, and then act on them when we find them." In short, there is very little real science in what we call "data science," and that's a big problem. .... the more data we have, the more false confidence we will have
If his point is that collecting Big Data is not enough, you also have to make sense of it. I agree. But in my definition the whole idea behind Big Data is that of course you are going to make sense of it.

One part where I agree is that Big Data enthusiasm will have plenty of accompanying froth.

What he is saying is making sense of data is going to be more important than collecting data. I agree. But that is what I thought Big Data was all about. To me it never was simply collecting.
Enhanced by Zemanta

Sunday, February 12, 2012

Another Ode To Big Data


New York Times: The Age of Big Data
an explosion of data — Web traffic and social network comments, as well as software and sensors that monitor shipments, suppliers and customers — to guide decisions, trim costs and lift sales ...... the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired ...... The story is similar in fields as varied as science and sports, advertising and public health — a drift toward data-driven discovery and decision-making. “It’s a revolution” ...... the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched ...... Welcome to the Age of Big Data. ...... data a new class of economic asset, like currency or gold ...... Big Data has the potential to be “humanity’s dashboard,” an intelligent tool that can help combat poverty, crime and pollution. Privacy advocates take a dim view, warning that Big Data is Big Brother, in corporate clothing. ........ a lot more data, all the time, growing at 50 percent a year, or more than doubling every two years ....... It’s not just more streams of data, but entirely new ones. ....... there are now countless digital sensors worldwide in industrial equipment, automobiles, electrical meters and shipping crates. They can measure and communicate location, movement, vibration, temperature, humidity, even chemical changes in the air. ........ the Internet of Things or the Industrial Internet. ....... Data is not only becoming more available but also more understandable to computers. Most of the Big Data surge is data in the wild — unruly stuff like words, images and video on the Web and those streams of sensor data. It is called unstructured data and is not typically grist for traditional databases. ........ the computer tools for gleaning knowledge and insights from the Internet era’s vast trove of unstructured data are fast gaining ground. At the forefront are the rapidly advancing techniques of artificial intelligence like natural-language processing, pattern recognition and machine learning ....... The wealth of new data, in turn, accelerates advances in computing — a virtuous circle of Big Data. Machine-learning algorithms, for example, learn on data, and the more data, the more the machines learn. Take Siri ....... The microscope, invented four centuries ago, allowed people to see and measure things as never before — at the cellular level. It was a revolution in measurement. ....... Data measurement.... is the modern equivalent of the microscope. Google searches, Facebook posts and Twitter messages, for example, make it possible to measure behavior and sentiment in fine detail and as it happens. ....... decisions will increasingly be based on data and analysis rather than on experience and intuition. “We can start being a lot more scientific” ........ the low-budget Oakland A’s massaged data and arcane baseball statistics to spot undervalued players. Heavy data analysis had become standard not only in baseball but also in other sports, including English soccer, well before last year’s movie version of “Moneyball,” starring Brad Pitt. ...... Walmart and Kohl’s, analyze sales, pricing and economic, demographic and weather data to tailor product selections at particular stores and determine the timing of price markdowns. Shipping companies, like U.P.S., mine data on truck delivery times and traffic patterns to fine-tune routing. ....... Police departments across the country, led by New York’s, use computerized mapping and analysis of variables like historical arrest patterns, paydays, sporting events, rainfall and holidays to try to predict likely crime “hot spots” and deploy officers there in advance. ....... data-guided management is spreading across corporate America and starting to pay off. ...... studied 179 large companies and found that those adopting “data-driven decision making” achieved productivity gains that were 5 percent to 6 percent higher than other factors could explain. ...... The predictive power of Big Data is being explored — and shows promise — in fields like public health, economic development and economic forecasting. Researchers have found a spike in Google search requests for terms like “flu symptoms” and “flu treatments” a couple of weeks before there is an increase in flu patients coming to hospital emergency rooms in a region (and emergency room reports usually lag behind visits by two weeks or so). ....... sentiment analysis of messages in social networks and text messages — using natural-language deciphering software — to help predict job losses, spending reductions or disease outbreaks in a given region. The goal is to use digital early-warning signals to guide assistance programs in advance to, for example, prevent a region from slipping back into poverty. ...... trends in increasing or decreasing volumes of housing-related search queries in Google are a more accurate predictor of house sales in the next quarter than the forecasts of real estate economists ....... social-network research involves mining huge digital data sets of collective behavior online. Among the findings: people whom you know but don’t communicate with often — “weak ties,” in sociology — are the best sources of tips about job openings. They travel in slightly different social worlds than close friends, so they see opportunities you and your best friends do not. ...... Researchers can see patterns of influence and peaks in communication on a subject — by following trending hashtags on Twitter, for example. The online fishbowl is a window into the real-time behavior of huge numbers of people. ...... Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” ...... “many bits of straw look like needles.” ...... Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. ..... Data is tamed and understood using computer and mathematical models. These models, like metaphors in literature, are explanatory simplifications. They are useful for understanding, but they have their limits. A model might spot a correlation and draw a statistical inference that is unfair or discriminatory, based on online searches, affecting the products, bank loans and health insurance a person is offered ...... Veteran data analysts tell of friends who were long bored by discussions of their work but now are suddenly curious. .... “The culture has changed” .... “There is this idea that numbers and statistics are interesting and fun. It’s cool now.”

Big Data Democratization By Wolfram Alpha
Big Data
Facebook And Big Data
Big Data + Smartphone = New Generation Smartphone
Big Data: Big News

Sunday, January 15, 2012

Big Data

Image representing Hadoop as depicted in Crunc...Image via CrunchBaseBig Data: Big News
Facebook And Big Data

After reading this you appreciate your Facebook stream just a little more.

O'Reilly Radar: What is big data?
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. ..... cost-effective approaches have emerged to tame the volume, velocity and variability of massive data. Within this data lie valuable patterns and information ...... Today's commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. ...... analytical use, and enabling new products ...... Being able to process every item of data in reasonable time removes the troublesome need for sampling ...... by combining a large number of signals from a user's actions and those of their friends, Facebook has been able to craft a highly personalized user experience and create a new kind of advertising business. It's no coincidence that the lion's share of ideas and tools underpinning big data have emerged from Google, Yahoo, Amazon and Facebook. ....... The emergence of big data into the enterprise brings with it a necessary counterpart: agility. Successfully exploiting the value in big data requires experimentation and exploration. ........ Input data to big data systems could be chatter from social networks, web server logs, traffic flow sensors, satellite imagery, broadcast audio streams, banking transactions, MP3s of rock music, the content of web pages, scans of government documents, GPS trails, telemetry from automobiles, financial market data, the list goes on. ....... the three Vs of volume, velocity and variety are commonly used to characterize different aspects of big data. ........ Having more data beats out having better models ...... If you could run that forecast taking into account 300 factors rather than 6, could you predict demand better? ......... Many companies already have large amounts of archived data, perhaps in the form of logs, but not the capacity to process it. ...... data warehouses or databases such as Greenplum — and Apache Hadoop-based solutions ...... Apache Hadoop.. places no conditions on the structure of the data it can process. ...... First developed and released as open source by Yahoo, it implements the MapReduce approach pioneered by Google in compiling its search indexes. Hadoop's MapReduce involves distributing a dataset among multiple servers and operating on the data: the "map" stage. The partial results are then recombined: the "reduce" stage. ......... Hadoop is not itself a database or data warehouse solution, but can act as an analytical adjunct to one. ....... A MySQL database stores the core data. This is then reflected into Hadoop, where computations occur, such as creating recommendations for you based on your friends' interests. Facebook then transfers the results back into MySQL, for use in pages served to users. ............ the increasing rate at which data flows into an organization — has followed a similar pattern to that of volume. Problems previously restricted to segments of industry are now presenting themselves in a much broader setting. Specialized companies such as financial traders have long turned systems that cope with fast moving data to their advantage. Now it's our turn. ......... Online retailers are able to compile large histories of customers' every click and interaction: not just the final sales. Those who are able to quickly utilize that information, by recommending additional purchases, for instance, gain competitive advantage. The smartphone era increases again the rate of data inflow, as consumers carry with them a streaming source of geolocated imagery and audio data. ......... The importance lies in the speed of the feedback loop, taking data from input through to decision. ........ you wouldn't cross the road if all you had was a five-minute old snapshot of traffic location. ......... "streaming data," or "complex event processing." ...... when the input data are too fast to store in their entirety: in order to keep storage requirements practical some level of analysis must occur as the data streams in. ........ At the extreme end of the scale, the Large Hadron Collider at CERN generates so much data that scientists must discard the overwhelming majority of it — hoping hard they've not thrown away anything useful. The second reason to consider streaming is where the application mandates immediate response to the data. Thanks to the rise of mobile applications and online gaming this is an increasingly common situation. ........ The velocity of a system's outputs can matter too. The tighter the feedback loop, the greater the competitive advantage. ....... Rarely does data present itself in a form perfectly ordered and ready for processing. A common theme in big data systems is that the source data is diverse, and doesn't fall into neat relational structures. It could be text from social networks, image data, a raw feed directly from a sensor source. None of these things come ready for integration into an application. .......... the reality of data is messy. Different browsers send different data, users withhold information, they may be using differing software versions or vendors to communicate with you. And you can bet that if part of the process involves a human, there will be error and inconsistency. ....... Is this city London, England, or London, Texas? By the time your business logic gets to it, you don't want to be guessing. ...... a principle of big data: when you can, keep everything. There may well be useful signals in the bits you throw away. ....... documents encoded as XML are most versatile when stored in a dedicated XML store such as MarkLogic. Social network relations are graphs by nature, and graph databases such as Neo4J make operations on them simpler and more efficient. ....... a disadvantage of the relational database is the static nature of its schemas. In an agile, exploratory environment, the results of computations will evolve with the detection and extraction of more signals. Semi-structured NoSQL databases meet this need for flexibility: they provide enough structure to organize data, but do not require the exact schema of the data before storing it. ........ three forms: software-only, as an appliance or cloud-based. ...... IT is undergoing an inversion of priorities: it's the program that needs to move, not the data. .... Financial trading systems crowd into data centers to get the fastest connection to source data, because that millisecond difference in processing time equates to competitive advantage. ...... 80% of the effort involved in dealing with data is cleaning it up in the first place ...... data science, a discipline that combines math, programming and scientific instinct. ...... The art and practice of visualizing data is becoming ever more important in bridging the human-computer gap to mediate analytical insight in a meaningful way. ...... advice to businesses starting out with big data: first, decide what problem you want to solve.

Facebook And Big Data

Česky: Logo Facebooku English: Facebook logo E...Image via WikipediaBig Data: Big News

ReadWriteWeb: Why Facebook's Data Sharing Matters
Facebook has cut a deal with political website Politico that allows the independent site machine-access to Facebook users' messages, both public and private, when a Republican Presidential candidate is mentioned by name. The data is being collected and analyzed for sentiment by Facebook's data team, then delivered to Politico to serve as the basis of data-driven political analysis and journalism. ..... Facebook could be the biggest, most dynamic census of human opinion and interaction in history. ....... Back in the middle of the last century, when US Census data and housing mortgage loan data were both made available for computer analysis and cross referencing for the first time, early data scientists were able to prove a pattern of racial discrimination by banks against people of color who wanted to buy houses in certain neighborhoods. The data illuminated the problem and made it undeniable, thus leading to legislation to prohibit such discrimination...... the relationship between data and knowledge generally in the emerging data-rich world....... David Weinberger .. "It's not simply that there are too many brickfacts [datapoints] and not enough edifice-theories. Rather, the creation of data galaxies has led us to science that sometimes is too rich and complex for reduction into theories. As science has gotten too big to know, we've adopted different ideas about what it means to know at all." ...... The world's largest social network, rich with far more signal than any of us could wrap our heads around, could help illuminate emergent qualities of the human experience that are only visible on the network level.
Google machine-reads all your Gmail emails. That is how it serves ads against them. I don't think that is a breach of privacy. I can imagine Facebook similarly machine-reading your private Facebook messages and updates. As long as individuals are not identified, that collective data is fair game. It has the potential to do tremendous good.

This also tells me Google is not the only major tech company trying to get on the Big Data train. Facebook is also well-positioned.

Curiously Yahoo's new CEO also has said he will take Yahoo into the Big Data domain. He got the vocabulary right. I hope he can deliver. Yahoo also sits on some pretty Big Data.

Wednesday, November 30, 2011

Big Data: Big News

Those who think GOOG is a one trick search pony, checkout GFS, BigTable, MapReduce, Tenzing, etc. These are the building blocks of Big Data
Nov 30 via webFavoriteRetweetReply


I am no pioneer to this observation, neither is this guy above. But it is so obvious Big Data is in the wings. Big Data will gather buzz like social has been the buzz for a few years now.