Data Anxiety

Tempus fugit

#identity

Is a federated Twitter even possible?

dwineman:

Toward the end of my last post, I mentioned that I’d like to see App.net move toward a federated architecture. Broadly, what that means is that… users and devices would connect [and] talk to each other in some clever way to collectively maintain the appearance of a single unified social network.

The advantages are numerous and comparable to those of the web itself: no single point of failure, no concentration of power, no risk that the entire network will be sold to Facebook.

But does this work for a service like Twitter?

Let’s find out. Since every good blog post needs a list of three things, here’s a list of three constraints we’ve come to expect of our social timelines:

  • Immediacy: if a post has been made by someone I follow, I can see it in my timeline right away (or close enough that I don’t notice the difference).
  • Chronology: posts always appear in order by time posted.
  • Monotonicity: timelines grow only from the top; older posts are never retroactively inserted.

The problem appears to be that no federated architecture can simultaneously satisfy all three of these conditions… Violating chronology is bad because it turns conversations into nonsense, but violating monotonicity means you can’t assume you’ve seen everything once you’ve read to the top of your timeline. Your client will have to maintain read/unread status for every item, and you’ll have to keep winding back in time to pick up things you missed. Which might be fine, but now we’re talking about something less like Twitter and more like email or RSS.

OK, so all of those options suck for conversations. But chronology is really only important within a conversation. So what if instead of replicating Twitter exactly, we shoot for a hierarchical, threaded model? The timeline would be a list of threads, and chronological order is preserved within each thread, but the threads themselves show up in arbitrary order. Oh, and you see a thread if you’re following the person who started it, I guess? Never mind, at least we’re getting somewhere! We’ve invented Usenet.

Oh.

The moral of the story is that the qualities that make Twitter interesting — its mix of conversation, discovery, and one-to-many communication — are direct consequences of its centralized architecture. Without the centralization you can still have something interesting, but it’s a different thing.

I’d love to be proven wrong.

This ("Venomous Porridge") is such a great website name! It is also a really excellent and fun article! I have thought similar things, which were discussed in the 69 comments that responded to the post. The preceding re-blogging is an excerpt, with selective emphasis of my own. Go read the entire thing on Porridge’s Tumblr if you are curious.

* I seem to have returned, somewhat… after doing even worse things to my CSS. I cycled through every one of Tumblr’s free themes the other night. You should try it some time! Plaid, Pink Ribbon, the one that looks like a postcard…

Manually replicating the social graph

Attention!

All social network homophily algorithms!

This is how it really works. Please don’t replicate.

I enjoy the DIY experience. It promotes genuine serendipity. FastCompany’s Facebook-praising article (17 May 2012) does NOT describe what I consider serendipity:

Back in the late ’90s, with the arrival of sites like Amazon and Google, we bemoaned the loss of serendipity. The web was now a place where you had to know what you were looking for in order to find anything. The social network is helping shift the balance back toward discovery… it’s also making discovery possible on other sites, by giving those sites tools that let their visitors filter content by Facebook friends, e.g Yahoo that integrated with Facebook to let you see what your friends are reading on its news sites, or design store Fab, which allows you to browse a feed of items that your friends are buying and favoriting. The result is that the web is increasingly a place for serendipity, facilitated by Facebook and your friends.

Read the Disqus comments following the article. I did. Most were in agreement with me. It seems a false distinction that FastCompany makes, claiming that

"searching gives way to discovering"

That is, searching with Google was bad, but using the internet with the input from one’s social graph, as guided along by  Facebook, is now “discovering”, which is good, an improvement. I think not. It is more like an intrusive invasion of privacy, to me.

So. I embarked on my little adventure, and realized to what extent my behavior was reflected by the FastCompany article. Yes, I do use Twitter. No, I don’t use Facebook nor have any suggested search options enabled in Chrome browser when I search on Google (there is plenty of tracking and nudging in place already, I realize…). I want to do things my way, without  algorithm driven input from “friends” facilitated by Facebook tracking! I use Twitter or other tidbits of information as I choose. Or disregard entirely. I am sure there are behavioral  effects a-plenty already, from even the non-Facebook services I use.

Anyway, this is how it went.

I read a pleasant post about English grammar, Indefinite Articles: A versus An on a blog with the subtitle “Cloud Security Infrastructure Architecture”, making it an especially welcome treat! I left a comment, which is what I am referring to above, as real, or natural, or even organic serendipity. A (very slightly modified) excerpt of my comment follows.

Organic serendipity

"Your post is accurate. It was a pleasure to read. Good job! Grammar and usage deities and demigods (and perhaps demagogue’s) will commend you for your efforts to maintain and uphold Free and Open Standards of communication, accessible to all!

In case you are curious, or even if you aren’t, I found my way here via the blog of a Lanier, Zach [@quine] who follows you. Is he related to Jared Lanier? Zach lives in Brooklyn, or so he says, and I know that Jared’s father grew up on or near Bleeker Street in Manhattan. I grew up in the same town as Jared and his father taught me at Hebrew school, well Sunday school, not on Bleeker Street though! Somewhere very far west and south of there. 

I found @quine because:

  • I was looking at a paper posted on Twitter[PDF!!!] via @ioerror, who I only knew as someone who took happy photos on Flickr, for a long time, and 
  • the paper was by EDIT a group in France, HOWEVER,  @ioerror’s subsequent Twitter update was about an open source security company called @subgraph and seems interesting, see slides about their security product, Using and Extending Vega and
  • I decided to follow @attractr the leader or owner or founder of Subgraph the company and
  • Twitter then suggested that I follow @msuiche if I like @attractr's content and 
  • when I looked at @msuiche (who works for Microsoft Security, ironically, maybe), Twitter suggested young Zach, the proprietor of HipsterGenocide dot com, who follows your blog, thus bringing me here, to this  delightfully grammatical post!”

“Over in the UK the majority of Pinterest users are male. Is the UK press going on and on about how male Pinterest is?”

Pinterest and Feminism » Cyborgology via s-m-i

AlternativeAssets tumblr said:

It is called Pint-erest in the UK, right?

Good find!

I read this comment to the post on Cyborgology (excerpt):

I’m a man, I enjoy lots of what I’ve seen on Pinterest… But it does seem undeniably, non-neutrally “womanish” to me, and it’s not about the recipes or pink images or lack of depth.

Let’s get real. It’s simply this: You get a nice post on music, architecture, how to prepare a great meal, and then one about how hunky…

to which I responded in style (or so I hope)!

If you view it, you’ll find a link to something pink, spiral-shaped and cool-looking. I promise.

chxor:

OAuth, brilliantly explained, on a napkin. via Matthew Story

I’ve left the field of combat in the OpenID (and OAuth) wars, but I can’t resist a particularly choice image like this.
Note: It is large, very large. And unexpectedly sweet, particularly in light of the dreary subject matter.

chxor:

OAuth, brilliantly explained, on a napkin. via Matthew Story

I’ve left the field of combat in the OpenID (and OAuth) wars, but I can’t resist a particularly choice image like this.

Note: It is large, very large. And unexpectedly sweet, particularly in light of the dreary subject matter.

Beyond the ngram

alea:

Your writing style is a little like your fingerprint. Your word choice, spelling, punctuation, sentence structure and syntax are all dead give-aways. Stylometry, the study of linguistic style, has been used to out the authors behind some of history’s most disputed documents, from Shakespearean sonnets to the Federalist Papers. In the latter, John Madison’s penchant for the word “whilst” was a big distinguisher; Alexander Hamilton preferred plain old “while.”

Software Helps Identify Anonymous Writers or Helps Them Stay That Way

This isn’t a tired summary of what text analytics types have seen circulating through the Twitterverse and blogosphere for the past x months.

Rather, it is an original post from NY Times Bits blog, with lots of links to studies that are pre-release, from unexpected institutions (good old Drexel University, for example, rather than computational linguistics star Univ. of Pennsylvania), and more.

Thank you, Alea!

Via End-to-End Analysis of the Spam Value Chain
An excellent study! It was short, easy to understand and full of original content.
Unusual features
Links to supporting research (with no pay walls in the way!), news, fun stuff too! E.g. “Anatomy of a Spam Viagra Purchase”. 
End-to-end Analysis of the Spam Value Chain is a recent study researched and sponsored by The International Computer Science Institute in Berkeley, California.
The International Computer Science Institute
The ICSI is one of the only non-profit independent research organizations in the U.S.A. It is also a leading center for computer science research, worldwide.

Via End-to-End Analysis of the Spam Value Chain

An excellent study! It was short, easy to understand and full of original content.

Unusual features

Links to supporting research (with no pay walls in the way!), news, fun stuff too! E.g. “Anatomy of a Spam Viagra Purchase”. 

End-to-end Analysis of the Spam Value Chain is a recent study researched and sponsored by The International Computer Science Institute in Berkeley, California.

The International Computer Science Institute

The ICSI is one of the only non-profit independent research organizations in the U.S.A. It is also a leading center for computer science research, worldwide.

U.S. Health and Human Services issues new proposal to HIPAA disclosures rule »

The second part provides individuals with a new right to receive a written “access report” that describes uses and disclosures of their PHI [personal health informaiton] made through an “electronic designated record set.” This new access report would include information on a covered entity’s workforce members who have accessed information and would apply to information in an electronic designated record set, not only information in an electronic health record, as required by HITECH. 

Your cellphone is stalking you

cnnmoneytech:

IBM distinguished engineer Jeff Jonas kicked off GigaOm’s Big Data conference with Scary Data: Cellphones are generating 600 billion geolocation records a day….“The data is being de-identified, but they know where you spend your time and who you spend it with.”

Jonas highlighted what data privacy advocates have been ringing alarm bells about for years: We’re tracking ourselves, in ways that would terrify us if a government tried it. Anyone who owns a smartphone carries in their pocket a tracking device that knows — and broadcasts — where you are. And we don’t really know who is getting hold of those records.

One of his clients is experimenting with using geolocation logs to track how often and for how long people visit various retail outlets. If store traffic has declined in recent months, they can detect that pattern — before the retailer reports its quarterly earnings.

True anomymization is hard to pull off with large data sets. It took academic statistical researchers two weeks to ID individual Netflix subscribers in a supposedly anomymized set of 100 million movie reviews. See slideshow deck available here: “Re-identification is somewhat trivial.”  

Geolocation logs can show where you spend your time, and who you spend it with. Jonas quipped: “I can give you a list of the 10 friends you’re around the most, and if you don’t recognize one of the names, they’re following you.”

Jonas says … it is playing a role in his project code-named G2 …