Nimsoft is the company who acquired Watchmouse, a long-time favorite of mine for performance monitoring, security, probably more. Nimsoft is probably best characterized as a comprehensive cloud computing provider. I read a post on their company blog, about the intense focus on big data:
This technology enables real time analysis of social network information… it is all about mining the trillions of trivial postings that we collectively make every second on the social networks of our choice. I know that “trivial” is harsh.
Why such interest in it then?
Mining and refining trivia can be used in innovative ways. Mostly it is about targeted advertising, but it is also about trend recognition and the analysis of collective thinking.
Sometimes the analysis of collective thinking is intriguing. Google’s Social Collider is one example. Another is Cultoromics, which was associated with development of the N-Gram Viewer. That was great. But such work also has a tendency to let questionable ideas roam farther than they would otherwise, particularly when they benefit from the veneer of faux-analytics.
Why else is big data so compelling? Well, there is technological challenge too.
What we have been using so far is inadequate for this job. With classical technology, and particularly SQL based databases, retrieval performance degrades exponentially with volume. Even the concept of “collect, store and analyze” has to be rethought. Now it is more like “collect, cache, analyze, store result”. To do that in real time with a variable and unpredictable arrival rate of data requires massive parallelism and efficiency of execution. It reminds me of the early days of computing when data storage structures were designed for performance and code execution times were measured and constantly optimized.
Technology innovators race to produce the fastest, most efficient, and most linear performance profile analytical tool. Are they doing this in order to accomplish anything productive? Sentiment analysis and the zeitgeist and the living pulse of our collective psyche, desires and dreams is cool to contemplate. Beyond that… I don’t have a clear vision beyond that point.
Consider too this article about venture capital funding pouring into big data companies (ComputerWorld). Some of these companies are neither start-up’s, nor particularly innovative in storage or processing of enormous data sets:
Curt Monash warned investors to beware of the hype surrounding the technology. “A great example of hype is anybody calling Birst a ‘big data' or 'big data analytics' company,” he said.
A prior Computerworld article described how Birst recently received $26 million in funding from Sequoia Capital and others, and has raised a rather hefty $46 million overall. Yet Birst went into business back in 2005, as a cloud-based business intelligence service. It has only recently begun presenting its products and software as a tool set for analyzing and deriving deeper meaning from petabyte-scale data sets.
As Curt Monash says:
"If anything, Birst is a ‘little data’ analytics company that claims, as a differentiating feature, that it can handle ordinary-sized data sets as well."
I have produced a Maltego Graph hosted on GitHub with the GPS EXIF Image Forensics Local Transforms from Recx Ltd of the image from the PasteHTML page referenced in the statement of Scott Jensen:
The following were also noted during this OSINT exercise…
This was fascinating and uncommonly logical, as in “easy to follow without specialized knowledge”.
The author is trustworthy!
Do not fear to click on any links. Often, security forensics posts are opaque, long winded, or simply boring! This post was none of those things. It used a data visual to explain relationships, accompanied by sufficient text to be meaningful.
In my typically verbose manner, I have (most likely) written more text than the post to which I am referring!
* I apologize about the visual status of my website. It seems to be unstable, looking worse every day, regardless of whether or not I further torture the template…
Google Analytics Market Share
Metric Mail uses data provided by Google Analytics. We wanted to estimate the share of websites that use it. There are some studies about the market share of analytics solutions, but they are using rather small samples.
Alexa provides a list of the Top 1 million domains…
We used their data as a starting point for our research. There may be issues about the reliablilty and accuracy of their data, but it is still the best source that is easily available.
Next we imported the data to Google App Engine’s data store…. importing a million rows takes a long time ;) Around two days. We needed to retrieve each site and then look for patterns in Google Analytics data. There is
For more info on the Mapper API check out their project site and the fantastic article on Nick Johnson’s blog.
- the identifier UA-12345-123 and
Here are the results (8+ days of processing):
- Number of websites successfully checked: 883194 (responded within 10 seconds)
- Number of Google Analytics profiles: 441207
- Market share in the top one million: 49.95%
Check out Metric Mail on Twitter here.
Can you say “death spiral” or even “fraud”?
Grumpy Old Accountants certainly can, and with relish!
Overstock has attracted analyst criticism for years…we just couldn’t resist taking a look ourselves, and the current vitals do not look good.
Inspection of the Overstock 2011 10-K provided ample evidence.
There’s been far too much non-GAAP (non-generally accepted accounting principles i.e. accepted standards with checks and balances) lately, be it pre-IPO S-1’s released by certain other companies (you know who I mean…) or similarly creative accounting by Overstock. Well, I feel this is true, in my not Certified Public Accountant-qualified opinion.
The grumpy accountants continue,
…we’re traditionalists, so the first thing we did was to compute the Altman Z-score. After all, if you can’t find a pulse, there isn’t much use in testing for other signs of financial health.
There is no slavish devotion to models here. Remember, these accountants are academicians. They probably need to toss in a statistical aside now and then. The single table of model results, 8 rows x 2 columns, is backed up as follows:
Driving these results are negative earnings before interest and taxes, as well as negative retained earnings. Additionally… values are driven by the erosion of shareholders’ equity. The debt/equity ratio rests at a staggering 12.56!
I’d keep my ears perked up for news from Overstock’s auditors. They should be blowing the whistle on this company, as it isn’t likely to continue as a going concern much longer.
The reason small changes in a logarithm (like MMS scale) mean big changes in what the logarithm is applied to (like actual earthquake magnitude), is because logarithms count what are called orders of magnitude. A plain-English way to say this is that logarithms tell you how many zeros a number has.
Things in the hundreds have 2 zeros. Things in the millions have 6 zeros. You see how it goes. For this reason, a logarithmic scale can be used to talk about huge ranges, such as the size of the solar system compared to the size of an atom (which is about 23 orders of magnitude in difference).
Logarithms also have properties that we humans often perceive as beauty.
“Have you ever tried dying to a pleasure voluntarily, not forcibly? Ordinarily when you die you don’t want to; death comes and takes you away; it is not a voluntary act. But have you ever tried dying voluntarily, easily, felt that sense of the abandonment of pleasure? Obviously not! Life is living, abundance, fullness, abandonment, not a sense of the ‘I’ having significance… If you experiment with dying to little things… you will see that your mind is capable of dying to many things, dying to all memories. Machines are taking over the functions of memory- but the human mind is something more…. But it cannot be that something else if it does not die to everything it knows…”
- J. Krishnamurti, The Book of Life
I do not want to go gently into that good night! I want to fight the dying of the light!
That was my initial reaction. Then I re-read the passage. A mind burdened with memories is pinned to the past, cannot make room emotionally or cognitively for the present and future. Yes, some of the past, of memory, can and must be relinquished, die without agonizing over it. Far easier to do so with events that are mundane, not too personally significant. Yet some of the good memories as well as the bad need to die, fall off the stack of memory.
I interpreted the passage differently on my first reading. I thought it was metaphor, polite euphemism.
Book of Life: Have you ever tried dying to a pleasure voluntarily, not forcibly?
Me: Yes! Not EVERYONE watches Wired Pussy on the internet and does that forced cum thing!” Not day in and day out.
B of L: Ordinarily when you die you don’t want to
Me: Ordinarily when I climax, I DO want to.
B of L: But have you ever tried dying voluntarily, easily, felt that sense of the abandonment of pleasure?
Me: Yes! I HAVE tried to climax voluntarily, easily. Usually, I do! But it is the very opposite of “the abandonment of pleasure.” The pleasure is opulent and thick.
B of L: Life is living, abundance and fullness.
Now I am as confused again as when I started writing this.
Your cellphone is stalking you
IBM distinguished engineer Jeff Jonas kicked off GigaOm’s Big Data conference with Scary Data: Cellphones are generating 600 billion geolocation records a day….“The data is being de-identified, but they know where you spend your time and who you spend it with.”
Jonas highlighted what data privacy advocates have been ringing alarm bells about for years: We’re tracking ourselves, in ways that would terrify us if a government tried it. Anyone who owns a smartphone carries in their pocket a tracking device that knows — and broadcasts — where you are. And we don’t really know who is getting hold of those records.
One of his clients is experimenting with using geolocation logs to track how often and for how long people visit various retail outlets. If store traffic has declined in recent months, they can detect that pattern — before the retailer reports its quarterly earnings.
True anomymization is hard to pull off with large data sets. It took academic statistical researchers two weeks to ID individual Netflix subscribers in a supposedly anomymized set of 100 million movie reviews. See slideshow deck available here: “Re-identification is somewhat trivial.”
Geolocation logs can show where you spend your time, and who you spend it with. Jonas quipped: “I can give you a list of the 10 friends you’re around the most, and if you don’t recognize one of the names, they’re following you.”
Jonas says … it is playing a role in his project code-named G2 …
Soldiers can contribute to the research — and in turn, protect their fellow service members — by returning damaged armor systems, so that the Army Research Lab can analyze them.
What they learn from the field is more valuable than repeated lab simulations.
Seems like that’s just the way of the world.