Big Data in Consumer Internet – Why I Do Not Believe In the Creep Factor

On Tuesday evening, I attended a presentation on “Big Data” by Scott Friend, a managing director at Bain Ventures. Scott founded and sold a retail analytics company called ProfitLogic. In addition to talking about his entrepreneurial experience, he focused on Big Data in consumer Internet. In this context, Big Data refers to all information consumers leave behind on the Internet when they post Tweets, upload files on Dropbox, or check in on Foursquare. [1] Naturally, the discussion led to a comment from the audience on the “creep factor” involved in Big Data.

In my view, this is the most important question in data analytics today. Consumers are sharing increasingly more information online and do not have the means to identify how this information is used by advertisers and companies trying to capture consumers’ money and attention. As is usually the case in situations with information asymmetries, trust is a necessary factor driving consumers’ online behavior. To continue to build consumers’ trust and encourage them to opt in to share personal information, companies should focus on the three strategies below.

Educate the Consumer

I have been an advocate Big Data in predictive analytics because of the way it has transformed the Internet into a discovery tool for me. I find it valuable to see personal and relevant online advertisements rather than random, untargeted ones.  I continue to discover new artists through Pandora. Knowing what my friends pin to their Pinterest boards enables me to buy better birthday gifts. I eagerly embraced the “Quantified Self” movement and even track and record my sleep pattern online through an app called Sleep101. [2] So far, I have opted in to connect to websites through Facebook every time the option exists and I liberally use the Like button to contribute to the 2.7 billion likes Facebook collects daily. [3] Predictive analytics has enabled me to discover things I did not know I wanted and enabled me to make better purchasing decision. This is the main reason why I continue to opt-in. The value Big Data contributes to my daily life prevails over the creep factor.

Protect Personal Information

I know that along with my Likes, I am making my willingness to pay heard by companies and advertisers. As a result, I may have become a victim of price discrimination. What I perceive to be a discovery tool may have limited my freedom to “browse” the Internet. My social security number could have been accessed and my credit card number could have been stolen.

Until I have concrete evidence that any of the above scenarios actually occurred, I will continue to hope that companies I interact with online on a regular basis, my bank, Dropbox and Amazon, are responsible with the personal information they store on my behalf. I believe that these companies have my interest in mind and invest in the necessary infrastructure to protect my online identity. Only by prioritizing online safety, can Internet companies increase the trust factor as well as the extent and depth of information they can collect from customers.

Be Transparent

Scott Friend gave listeners a business idea at the end of his talk. He suggested that we start working on a tool to help consumers uncover all the ways information they opt in to provide has been used for or against them. Until such a solution can be developed, consumers have to rely on Internet companies to be transparent. By asking consumers to opt in, making “Do Not Track” feature more prominent and enabling consumers to skip ads online companies can build trust. [4]

Consequences of not meeting the above standards are too risky to bear for companies. Encouraging the creep factor and inviting restrictive government regulation to protect the consumers are the first two that come to mind.






read more

Sharing paranoia

I consider myself an early adopter. I’m a gadget geek, I frequent lifehacker, I love new shiny things, I own a freaking Virtual Boy. But when it comes to sharing things about myself online, I’ve been quite slow. It took me four years of using Facebook before sharing my first (non-profile) photo. The main reason for my delay has been concern about my private information. I think it has become too easy to share intimate details about your life, even when you don’t intend to, so I’ll talk a little about it here.

The days of autonomous internet surfing are numbered (if not already gone). On your mobile device, likely the dominant browsing platform of the future, a variety of companies already have a disturbing amount of information about you.

There’s an app for that!

The apps you have on your phone know exactly (if not now, they will soon) what food you like, your exercise habits, what you spend your money on, where you like to travel, exactly what color/size “BUY NOW” button will get you to buy that level 8 magic unicorn dust to supercharge your belligerent zombie bird town-ville-farm-with-friends (disclosure: I worked in mobile gaming this summer). If you have an app for that, someone’s analyzing you doing exactly that. Granted, people are becoming more aware of this, and in response, the OS providers are changing the rules to help protect user privacy.

 But really? The OS providers??

Apple and Google have a ridiculous amount of information, beyond just your location. Just think about what data Google has on you from just your phone: realtime GPS location +  Gmail messages/chats + Google contact list + Google Voice (or just regular phone) call log + search history (oft-overlooked, but incredibly important in my book) + whatever else I’m sure I’m missing. Sorry Apple fanboys, but you’re in a similar (albeit oohhh shinier!) boat, especially with the recent iCloud push. Sidenote: I’d argue Apple fanboys are even more at risk given the reduced fragmentation of services in the iOS/OSX ecosystems, which is ironic. But I disgress.

 Don’t forget about the evil carriers

Finally, we must not forget about the, arguably most powerful, companies: the infrastructure companies. Not only do they have the ability to see exactly where you are (mandated by law), but they can literally see everything you’re doing online. Not much to say here. They control the pipes, and they know what’s going through them whether you like it or not.

 Ok, before you start flaming me, I acknowledge that all sorts of regulations/laws/noble mission statements are in place to prevent these companies from doing evil things, but my point is that the tracking capabilities, whether they’re used or not, are already there and LOTS of people have these capabilities (and c’mon, really? since when do rules actually stop evil intentions?). For these reasons alone, you should be very wary of what you share.

More interesting than my paranoid security concerns, however, is the issue of relevancy. I also believe that what you do and who you are IRL (in real life) will have less and less relevance to your online life as more and more of our lives move online and as the tracking capabilities outlined above continue to improve (and consolidate with a few gatekeepers). But that is a topic of a follow up post, and I’m out of time for now. Until next time.

read more

The Evolution of Online Data – Adding Value or Violating Privacy?

Everything you do online – pay bills, buy clothes, book travel, make reservations at a restaurant or an appointment with a doctor –contributes to building your online identity and adding data points to the massive amount of information on the internet.  Companies have been evolving in the way that they capitalize on this data.

Roger Ehrenberg of IA Ventures posted an interesting blog, in which he comments on the “… difference between the mere presence of data (say, an inert corpus of data accumulated from customer transactions) and its activation (putting that same data in a form that can be analyzed in real-time to provide intelligence about trends, pricing, feature attributes, etc. and classified and stored in a way that subjects itself well for future analysis).”  When looking to the most successful companies of the past decade, it is clear that they have chosen this “activation” route.

Adding Value

Many online retailers have attempted to use customer data to improve the shopping experience.  Phrases such as “You might like…” create the impression of personalization, a friendly suggestion from someone who knows you well.  It’s an attempt to provide a curated user experience where shoppers trust the site to perform a search of all products and suggest the most relevant one based on various criteria (which can be opaque or transparent depending on the site).

Amazon was one of the first to direct users to specific products, and did so based on both your behavior as well as the behavior of previous purchasers.  It offered a way to bypass the exhaustive search process and rely on the efforts of your predecessors.   It’s rumored that Gilt Groupe presents a variety of homepages, highlighting different sales for different users.  People at the fashion start-up I worked with this summer went to great lengths to prove or disprove this and found enough variation to believe that there is some customization. However, it’s unclear the extent to which this is done or the criteria used to determine which homepage a user views.

Customer information has also been used for more direct monetization through “social advertising” which capitalizes on demographic data and each internet user’s network to target a specific audience, opposed to the more traditional AdWords which relies on keyword searches by an anonymous group of individuals who may or may not be in your target demographics.  Yipit, an online aggregator on daily deal sites, analyzes purchasing behavior and trends from across hundreds of deal sites.  It packages the data into a monthly report and sells it back to deal site operators as well as to investors.  Finding value to extract from aggregated information is a different and interesting approach to the monetization of data.

Violating Privacy?

For every action there is a reaction.  In the actions relating to big data, Personal is the reaction.  Personal is a startup that gives consumers control over their data, allowing them to regain their privacy.  Additionally, this service allows consumers to benefit financially – organizations can only access your personal information with your consent, and they have to pay for it.   The company has raised $7.6M in Series A and went into beta in March 2011.

As online behavior is disseminated to more and more companies, how will individuals respond?  Will people start to take action over the availability of their personal information online?  It will be interesting to watch how the Personal user base grows or remains stagnant coming out of beta.

Personally, I believe that the availability of data has brought value to my online experiences.  I’m happy to provide my shopping history if it implies a better shopping experience next time.  While there have been minor annoyances, such as unwanted advertising and a crowded inbox, I have not experienced anything that’s inspired me to “protect” my data through a formal service like Personal.  However, it would only take one impactful abuse of my online identity to look into using Personal.

read more

In May of this year, Representatives Edward Markley (D-MA) and Joe Barton (R-TX) introduced bipartisan legislation that would amend the long-standing Children’s Online Privacy Protection Act of 1998 (COPPA). This was followed shortly by Sen. John D. Rockefeller’s (D-WV) presentation of the Do Not Track Online Act of 2011. These “Do Not Track” pieces of legislation mark a ramp up in Congress’s efforts to strengthen privacy protection on the web, particularly as it influences young children. The impact for a myriad of social networks (Facebook) and search platforms (Google) could be significant, particularly as these companies seek to monetize the information they gather from users.

Consumer Reports announced in May that close to 7.5 million children aged 12 or younger are on Facebook. This is in direct confrontation with COPPA, which effectively prevents social networks from signing up kids and Facebook’s own policy that members be at least 13 years old to open an account. The potential benefits of Facebook accessing young members are huge – they may be more inclined to stick with the platform over time given early adoption; be more open to sharing information, as they have been conditioned to do so at an early age; and ads can be further personalized as you share and “like” more sites and products via the site. Much of this translates into more revenue for Facebook and further dominance as the premier social network.

Many parents who let their kids set-up Facebook accounts or other online profiles actively monitor their use. I have several teen and pre-teen cousins on Facebook and their parents are often their “friends” on the site and frequently check-in and limit their usage. While this is a good check, it still cannot fully protect against virus infections, identity theft and online bullying. Facebook has also taken important steps to protect young users of its site. It has actively engaged in measures to report bullying and is a partner in the Amber alert system. Still, these measures do not address the root of privacy issues that are concerning many parents and now Congress.

The Do Not Track Kids bill extends COPPA in several interesting ways. Most significantly, advertisers would be prohibited from targeting kids online. Data collection would also be limited and an “eraser button” would be created that would allow young users to eliminate damaging information online. Companies would have to explain up front the type of information they are collecting from young users and parental consent would have to be gathered in advance of this data collection.

As the 2012 election year approaches, it will be worth watching the progress of these bills through Congress, as well as the efforts of Facebook and other companies to influence the path of the legislation. Millions of advertising dollars are at stake, as well as the safety and privacy of countless young children.

read more

Managing an online identity has become an increasingly important part of life.  With so many options to connect to the world via cyberspace, we are forced to pick and choose which social platforms to engage with and what exactly to share.  Being part of a particular social network establishes a permanent record of your interactions with that network.  No hiding your past.  That means you better be careful of what you post. 

By now, many of us have heard the urban legend where an employee tweets about his boss and is subsequently fired or where a candidate is not offered a job because of a controversial Facebook photo.  While these stories may or may not be true, employers are definitely Googling candidates and making inferences about that candidate’s personality from the results that appear.  Thus, we are forced to segment our online identities and tailor our profiles accordingly. 

Being aware of the importance of your online identity highlights the complexity of managing it.  Google yourself and see what returns.  You may be surprised to find that short article you wrote back in college turns up in the top five search results.  You may also be surprised that a photo on a friend’s Flickr account is publicly available.   Most of us would like to maintain a sort of professional persona to employers and colleagues while simultaneously sharing personal pictures from travels with friends and family.  How can this “split” identity be maintained? 

Privacy and security issues are a hot topic these days, giving rise to added privacy features on social networks that allow you to control exactly who you share with.  This added flexibility is beneficial but requires a certain amount of thought each time you make a post or upload a photo as you decide what segment of your friend list you entrust with the material.  It can quickly become overwhelming to remember what you posted, where, and to whom. 

However, having an online identity is not all evil.  The internet is often a source of first impressions and thus can be used to shape how we want others to perceive us.  Considering the ease at which web pages can be developed, you can create your own website that broadcasts your interests or highlight only certain work experiences on your LinkedIn profile.  Take, for example, a graduate student with a healthcare background who is now interested in pursuing a full-time position in technology.  By showcasing her thoughts about the latest tech trends in a few blog posts and a personal AboutMe page while “following” tech-related bloggers on Twitter, she is proving her industry commitment to potential employers.

So, just as you pay your bills on time to build a good permanent credit history, use websites and social networks carefully to build an online identity that you are comfortable sharing with the world, permanently.


read more