## #probability

**ryanandmath**:

You might have looked at the above picture and thought, “Oh, a family of normal distributions!” And… you would be wrong. The distributions above are forms of the **Cauchy distribution** which has a **wider peak** and **fatter tails** than the normal distribution.

**The Cauchy distribution**

Although it is a bell-shaped curve its *mean and variance are undefined*. Due to the equation through which the probability distribution function is defined, it is not possible to calculate the moments needed to get the mean or variance, nor is it possible to calculate any other finite moments. The Cauchy distribution has many uses, e.g. being part of the solution to Laplace’s Equation on the upper half plane…though it stands as a classic pathological example in probability and statistics.

**A classic pathological example in probability and statistics**

;o)

## Plotting the cumulative paranormal distribution

*Via* **Cross-Validated StackExchange, Favorite statistics humor** (found in the revisions), by EpiGrad

From: *A visual comparison of normal and paranormal distributions,* Matthew Freeman J Epidemiol Community Health 2006;60:6.

Lower caption says **'Paranormal Distribution' **- no idea why the graphical artifact is occurring.

## Platypus tales and tails

@gappy3000 I understand the platypus now http://t.co/Gw2NavHm

**Kurtosis humor**

**The central idea of this book concerns our blindness with respect to randomness, particularly the large deviations … **

*Reader submitted photograph (front jacket cover) for Amazon listing of N.N. Taleb’s* **The Black Swan**,* First Edition, April 2007.*

I noticed this today. It is the entire first chapter of N.N. Taleb’s **The Black Swan** as it appeared in The New York Times in April 2007. It remains a worthwhile read, particularly since it is free!

I have mixed opinions about the entire body of Taleb’s body of work. That will follow in my next post, which I wrote over on Blog Central some time ago.

## Politics of odd numbers and odder taxes

Could this be an application of Benford’s Law for detecting political bias or worse? Maybe.

The terms weren’t obvious to me at first, so let me explain what is meant by “odd pricing”. In stores in the U.S.A., and elsewhere (the study below used municipal tax data from Denmark), prices for goods and services marketed and sold to consumers are often priced with 9 endings, including decimals.

Here is a typical example, $89.99. It is an obvious but effective way of exploiting cognitive bias. People perceive the price as $80.00, or in the $80 to $89 dollar range. It would be more straightforward to simply price as $90. The same is especially true when there is a transition between orders of magnitude e.g. from three to four digits.

Doesn’t $998.99 seem more affordable than $1000?

From the concept of *odd pricing*, i.e., setting rightmost price digits below a whole number, this paper advances the political counterpart of *odd taxation *using a panel of Danish municipal taxes.

First, the distribution of tax decimals is non-uniform and resembles the distribution of price-endings data.

Second, nine-ending and other higher-end decimals are found to be over-represented which echoes *odd pricing* research. It suggests that incumbents take voters’ biases into account and apply *odd taxes* to minimize the political costs of taxation while maximizing revenue. Attention should be given to how policy digits are arranged to exploit voters’ cognitive biases.

- Asmus Leth Olsen, “**The politics of digits: evidence of odd taxation (Abstract)**”, Public Choice, Springer, June 2011

**Preview** DOI 10.1007/s11127-011-9807-x

I opted to use distribution analysis as I’d done with uid and gid values previously. The nice thing about this approach is that it would mean a little refactoring would make it work for modes as well. While I was at it, I figured what the heck, I’ll add support for doing distribution analysis of time stamps as well, though I believe that will prove less useful than uid, gid and mode analysis.

This site is a resource for **free** online lecture notes and books about:

- stochastic processes and applied probability,
- stochastic calculus,
- measure theory,
- probability distributions,
- Brownian motion,
- financial mathematics,
- Markov chain applications,
- Monte Carlo simulation,
- Martingales
- Much more!

All the links are active, they actually work.

I found it thanks to **IBM**!!! On the sidebar of this interesting little blog, with a very cute name **xyzz yxyzzy***. Is that amazing or what?

Entire books: **FREE**!

University course notes and syllabi. Not student notes. Lecturer/ instructor documents! 100% legal too.

So awesome.

* Glance at that post I linked to, by the way. Nice directory name temporaryWarPath.

*Via* **End-to-End Analysis of the Spam Value Chain**

An excellent study! It was short, easy to understand and full of original content.

### Unusual features

Links to supporting research (with no pay walls in the way!), news, fun stuff too! E.g. “Anatomy of a Spam Viagra Purchase”.

*End-to-end Analysis of the Spam Value Chain* is a recent study researched and sponsored by The International Computer Science Institute in Berkeley, California.

### The International Computer Science Institute

The ICSI is one of the only non-profit independent research organizations in the U.S.A. It is also a leading center for computer science research, worldwide.

Computers are supposed to be deterministic.

This is often the case for single processor machines. However, as you scale up, guaranteeing determinism becomes increasingly expensive…

In a distributed application like sorting a petabyte of data, you should expect about 1 hardrive failure, 100 irrecoverable disk read errors, and a couple of memory errors if you use error correcting RAM (more if you use regular RAM)… You can make your computation deterministic by adding checksums at each stage and recomputing the parts that fail. However, this is expensive, and becomes disporportionately more expensive as you scale up.