Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Big Data for Readers

The Hidden Disruptor
The number of Amazon Kindle e-readers and tablets is estimated at 20 million devices (Forbes, 2014). The same authors gauge Amazon’s yearly revenue from e-book sales at between $265 to $530 million. But beyond the disruptive impact that Amazon has had on the overall publishing industry – with the current dispute with $10B publishing giant Hachette on center stage – there has been much less discussion about e-book Big Data.

Kindle as BigData for readers
Amazon’s Kindle: 20 million and counting. (Credit: Wikipedia)

Beyond Goodreads
In 2013, the book lover social network Goodreads was acquired by Amazon. Goodreads has attracted the interest of as many as 25 million users, and it’s now been integrated into some Kindle functions. For example, in the Kindle Fire HDX, a reader can rate and comment on a book after the last page has been read.

Readers who prefer print books because they can use a highlighter or write in the margins might want to reconsider that opinion. Although e-books probably represent a small fraction of the book library owned by collectors over a certain age, and e-books may cost more than the print version of a book, e-books are increasingly important for book lovers. This importance has to do with the flexibility of digital media. Amazon and others have moved to leverage this flexibility.

There’s more. Kindle readers can add or remove bookmarks, highlights and notes in e-books (or personal documents, such as a Company or a product sheet). What Amazon calls Public Notes are available of Kindles with version 3.1 or later. As shown in the author’s screenshot, a Kindle user can elect to share highlights and notes with others – as well as to follow other commenters for a particular book, or everything a reviewer comments upon.

Amazon e-book and document notes can be made public at

Readers can also review their own highlights and comments online, as well as on each Kindle device. In the screenshot below, a highlight created within an e-book is viewed in a web browser on

For reviewers – professional or otherwise – the value in switching between device and browser-rendered notes and highlights is nothing short of indispensable. This is not because electronic highlights and notes are faster to create than paper ones. After all, it does not take much time to stick a paper bookmark in a print book or to highlight a sentence in a book. In fact, the user interface in e-books for creating these annotations is still evolving; the UI is almost trivial for regular users of the feature, but requires a bit of spinning-up for casual users. But once created, having these notes in digital form, where they can be cut-and-pasted into documents, apps and web forms, is truly liberating. In an ideal world, perhaps reviewers will strike a more helpful balance between opinion and quotation.

Amazon provides a Flashcard-like display of e-book highlights and notes.

Big Data for Readers
Some of the features already provided through e-book readers connect to, or provide content for Big Data repositories. These include:

  • Receive recommendations from third parties: Goodreads, book critics, Amazon-followed reviewers
  • Receive and fine-tune optional recommendations based on your own purchase or reading history from Amazon, Google or others
  • Create annotations, tags and categories (Amazon calls them “cloud collections”).
  • Inline research, cross-referencing, fact-checking and lookup is facilitated. The Kindle HDX allows users to launch Wikipedia to look up highlighted text.
  • Chapters, paragraphs, sentences, expressions, outlines, indexes and bibliographies within e-books provide possible links to external resources – even to the semantic web.
  • Publishers like Elsevier and research communities like ResearchGate are looking at ways to perform more “deep linking” into content, to include even raw data, illustrations and pre-publication drafts.
  • Cloud-hosted text analytics are possible for Kindles that have integrated web browsers or specialized mobile apps.

There are even more pervasive implications for prospective authors. Amanda Luedeke (The Extroverted Writer: An Author’s Guide to Marketing and Building a Platform) argues that every prospective author must build a substantial social media following before a first author can expect a book deal. Leveraging reader Big Data may be essential for building specialized audiences and identifying fruitful social media promotions.

Reading to Learning: Enterprise Learning, Training Takeaways
The picture for an e-book’s ecosystem is not completely rosy. Authors and publishers, with a few notable exceptions, have not fully embraced the readers’ Big Data world. E-books may not only look unattractive in their e-book incarnation, but, which is worse, they fail to leverage within-text cross-references, indexes, bibliographies and other links.

Within the Kindle world, for example, it is possible to sync documents between a desktop or tablet so that they can be read on a Kindle device. However, the annotation and other Big Data capabilities discussed here may not be available for, to take one example, a PDF document.

Amazon’s Kindle may not replace enterprise Sharepoint, Office365 or Google Docs sharing capabilities across tablets and smartphones anytime soon. But given the Big Data infrastructure already available to Amazon product designers, it would not be much of a stretch. The vast quantity of Kindle devices at an affordable price point suggests that Big Data for readers will include learning and training content distribution, sharing and measurement.

What you’re reading and highlighting today may help you understand what you, co-workers, or future employers and customers want to learn.

Today that learning might include Syncsort’s partnership with Waterline.

Related Posts