Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Interview with Matthew Bishop from GetElastic.com about Big Data and Ecommerce

Photo: Matthew Bishop at Elastic Path

Matthew Bishop, Sr. Product Architect at Elastic Path

Can you explain the category “experience management software” for those not working in the Elastic Path space?

The concept of experience management is to provide a unified, personalized experience across the various touchpoints one would use to interact with the business, or application, or other projection of capability. A good example would be a company like Netflix or Amazon, offering various touchpoints (websites, mobile software, embedded software in TVs and such) and providing a consistent, personalized experience across all of them. If I start watching a show on my TV, I expect to pick up where I left off on my tablet. I will also see recommendations of similar shows and products to keep me engaged.

The more a customer engages your capabilities and offerings, the more value they derive from it, thus the more they are willing to transact with you. Experience management is about building and serving that relationship in ways that resonate with the customer. When done well, the customer embeds the offering into their lifestyle and pushes out competitive services and offers.

ETL is an active development area at Syncsort (Meet Syncsort DMX). What ETL tools are you seeing used to move clickstream and other high volume data into big data repositories?

The biggest question around responding user events is, what kind of reaction are you hoping to gain? The more you want to shape the current, live experience, the closer the connection to the user interactions needs to be. One big selling point of responsive projections is that the user’s experience can be improved as they use the site. If a site learns that a customer performs a search and then usually sorts by brand, then skips ahead to, say, Sony, a responsive service might presort the results by brand so that Sony is first in the list.

The whole topic of responsive projections is fraught with risks – performance may suffer, the user can experience unexpected behavior, or the particular theory of engagement may be wrong. Why people spend more time on a site or in an app is a very hard problem to solve, and it is not solvable across the entire customer base. It is not even solvable for the same customer over any period of time!

Responsive projections try to make intelligent assumptions about what the user is really trying to achieve, but it’s hard to do that correctly without a way to test the assumptions. Responsive projections actually need A/B testing capabilities to try out theories on subsets of the visitor base. You need a way to try out an idea without changing the experience for everyone.

An example of this is commerce checkout experiences. Consider this theory: do people want a single-page checkout or do they want a wizard-style checkout? Is there a set of conditions that would lead to one or the other? Is it based on what’s in the cart? The time of year? The number of times the customer has visited and looked at the same item? No one is going to come up with the right theory in this situation without testing. You need a way to try multiple theories to see what actually works.

The big data community is crackling with conversation about the move toward real-time eCommerce – such as immediate recommended updates, dynamic web page tailoring, customer-specific sales opportunity dashboards? From where you sit, how much of this is real in eCommerce, and how much is premature hype?

It depends on the product. Airlines have been using this quite effectively to get people to buy the seat. Look at Kayak and their “price prediction” chart, or EasyJet’s “6 people are currently looking at this trip” messaging.

Commerce experiences are usually spent window-shopping, or imagining ownership of a particular good or service. Almost no one goes directly to a site and buys anything other than staples or replacement items. Usually they surf around, read reviews, talk to their partners, do the math with their bank account, etc. The real opportunity for commerce is to provide the best “imagination” experience and combine it with the best offer at the right time. Someone can come back to the same TV, or type of TV, for months. If you can identify this behavior, you can try various offers to get them over the line to actually buying the TV, from you rather than Amazon.

In this video featuring you on your web site, you speak of “a unified API.” Is this the same notion that Cloudera proposed for its big data hub, or a reworking of SOA and middleware concepts from earlier?

The unified API is an absolute necessity to solve the experience management goal. All those touchpoints, being touched all the time, means they have to project the same experience across the board. You don’t want inconsistency in this, especially in commerce. If I get one price on my computer at work and another on my phone, that’s not good. If a coupon I scanned on my phone goes missing when I check out at home, that customer is gone forever.

Without a unified API, the organization has to find a way to manage the experience across touchpoints by hand. Touchpoints are usually managed by different teams with different, even conflicting, goals. Even a single website can be managed by multiple teams. Some large eCommerce sites have the “header” team, which is different than the “search” team. Try to do this same separation of concerns with mobile, with TVs, with whatever comes next in the Internet of Things.

The unified API provides consistent governance and a unified offering no matter what team is consuming it. The header team doesn’t have special access to a different API than the mobile team. They all cook from the same kitchen.

Your hypermedia API engine works by exposing “a set of highly scalable REST interfaces that conform rigorously to the HATEOAS constraint and Level 3 of the Richardson Maturity Model (RMM).” Martin Fowler elsewhere gave RMM an endorsement of sorts , writing that “I don’t think we have enough examples yet to be really sure that the restful approach is the right way to integrate systems, I do think it’s a very attractive approach and the one that I would recommend in most situations.” Can you explain to readers what RMM is and why it’s important for integrating applications?

The theory behind HATEOAS is that clients should know as little as possible in order to provide an application of a projection. This is what “application” means. Many applications can consume the same projection and offer different experiences. These applications, however, must not bend, break or otherwise misconstrue what the projection offers. To achieve this with something like SOA (Service Oriented Architectures), the application developer must know a lot. They must know:

  • The semantics of the services they are consuming
  • The endpoints and how to activate them
  • When they can be activated
  • What to do if activation fails

This involves a lot of reading documentation (often incorrect), using complicated, brittle SDKs, and most of all, running the app and experiencing trial and error. It is slow, frustrating and usually a failure. I call this exception-driven programming because you are just trying not to fail.

HATEOAS addresses the problem by eliminating most of b, d and all of c. It does this by providing resources, not services, that have links to other resources. These links are pathways to discover the other endpoints. The existence of a link means it can be followed; no link, no endpoint at this time.

For example, imagine a product API and a cart API. Traditional SOA would have the cart API provide an “add to cart” endpoint with the product SKU and quantity. However, the add to cart may fail if the item is out of stock, or is not available yet, or not purchasable with other items in the cart, or it’s Sunday and we don’t sell wine to people in your state on Sunday, or any other business rule. The cart API, hopefully, knows these rules and sends back a failure message when the application tries to add the product to the cart. The client then has to interpret that message for the user.

The HATEOAS way of doing the same thing is to have a product resource and a cart resource. When the application reads the product, the cart resource can add a link to “add to cart” if that action is possible. If not, the link is not added. All the client has to know is the semantics of the situation (“add to cart” link means I can add this product to the cart). They do not have to know how to construct the call to cart, they just activate the link. They rarely, if ever, encounter error messages.

HATEOAS (now called Hypermedia APIs) puts all the business rules for the projection in one place, the server. The client knows nothing about the business rules, they just know how to render what they see. They know the relationships between the resources, and that is basically it for the semantic model. Much simpler for the app developers, and the business is in charge of the projection across all the touchpoints.

Big data can add to the risk of de-anonymization of personally identifiable information, or unintentionally enable employee access to sensitive customer or company information. When helping retail customers to design new systems that integrate additional touchpoints, do you find that additional security and privacy measures must be added?

This problem is very real and one of the key factors that led Elastic Path to build our own Level 3 REST engine, now called Cortex. In general, API engines assume all access is anonymous and public. They offer some tooling around restricting based on identity or similar, but these are add-ons, and pretty bad at that. However, providing a projection that is shared by all touchpoints has to offer identity and security in the most robust manner possible.

We wanted to make identity and roles an integral part of the offering. Every call through Cortex is identified with a role. Even “anonymous” visitors are identified and provided a stateful experience. We use the RBAC-A permissions model to control access to every endpoint. It is not an add-in, and the resource implementations have no choice but to specify their allowable permissions.

Furthermore, Cortex allows communication between resources, but only through a channel that flows through the same permissions model. This makes it very hard to communicate with other resources through side channels. We even have security walls between resources to make it hard to see their code.

All this was put in place to address the privacy and security concerns that an API must address in order to gain the trust of their developers and users. We put it in place in the same spirit of HATEOAS, meaning the application itself has very little semantic knowledge of what’s going on; they have to know how to gain an access token (standard protocols apply) and they have to know how to send a login event, but that’s it. The identity of the user is not known to the application. The roles required to exercise the projection is not known by the application. Thanks to HATEOAS links, we can omit links that the current user’s role disallow access to.

Your open source stack also includes Drools. Can you discuss where Drools is used now in your offers? Could it enable customers to perform to domain-specific reasoning capabilities – perhaps connected to the semantic web, or to customer-managed ontologies?

We provide rule-based pricing and promotions in our OOTB offering, but it is available to our customers for any other rule-based use cases they might want to employ. The hard part of rules is expressing them in a way that the average business user can understand and not “mess up”. This leads some to think a DSL would be helpful, but we chose to provide a UI for the DSL instead of a textual version. The reason is that the user of these tools usually is a business person, and DSLs have not worked well with this audience. What has worked well for business people is spreadsheets, and we have experience utilizing spreadsheets as decision tables.

As far as the Semantic Web, we don’t see a lot of call for this. I’d like to see more commerce companies use RDF for better SEO and search result presentations. Schema.org and GoodRelations (http://www.heppnetz.de/projects/goodrelations/) provide ontologies that create nice presentations for products in search engines.

Other than that, commerce is pretty reluctant to become too searchable. Remember, they want to provide an experience, not data. It’s hard to romance a customer with SPARQL.

While the notion of “transaction” is somewhat deprecated in big data, since it connotes structured OLTP models to some, it does seem that organizations like Elastic Path are deepening the concept of a transaction. A transaction could be a composite of marketing, pricing and customer-specific events that are tactically coordinated using products such as yours. Can you talk about how the semantics of “events” and “transactions,” such as those that might appear in Complex Event Processing, fit into the Elastic Path world view?

Transactions grew out of the call-response architectural style that Web browsers use to communicate with a server. You send a POST, wait for the response, and if it succeeded you get a SUCCESS message. If it failed, you get a FAILURE. This is simplistic and fragile. Remember websites that said things like “Only Click Checkout Once!”? This led to a more reliable model called POST/REDIRECT/GET where the POST response is quickly redirected to a blocked GET request associated with the transaction underway. That GET blocks until the transaction succeeds or fails.

No one really likes this model because it forces the user to wait. And wait. When they could be shopping. A better model is to take their POST, send them the location of the status resource (the GET in POST/REDIRECT/GET) and then let the client “listen” for the status of that resource to change from IN-PROCESS to SUCCESS or FAILURE.

This is far superior because that status can be long-lived. Imagine an order that has a long lifecycle from ordered, paid, fulfilled, shipped, delivered. Many, if not all, real-world workflows have intermediate state. Communicating this state clearly, with a push model, is far superior to the short-lived transaction boundaries that are somewhat arbitrarily defined. I love the blog post “Starbucks Does Not Use Two-phase Commit” because it clearly models the reality of a commerce system.

The call/response model breaks down further with multiple touchpoints and the increasing demands of a unified experience. As a result, client applications will need to know of state changes that occur from sources other than themselves. This communication is basically events, pushed from the server, to keep all the applications’ states current.

OLTP will give way to CQRS. That’s my prediction.

According to a 2013 report, 19% of the world’s websites run on PHP-based WordPress, and many of these are eCommerce sites. What’s your experience been with WordPress as an eCommerce platform, especially considering the recent $160M cash infusion Automattic received?

WordPress is a great way to market and sell a product or two with very little friction. This is why PayPal did so well (and still does very well) with their “Buy Now” button. Beyond that, WordPress is not an eCommerce platform.

Instagram, maybe more so. Check out Ikea’s catalog: http://instagram.com/ikea_ps_2014.

Will big data eCommerce – fueled by machine learning, new unstructured datasets and analytics portals – allow for more extensive mining of historical, keep-everything eCommerce data than has been done in the past? What does this mean for agencies? Is their tooling and talent ready?

Will all this allow more extensive mining? Of course. Will it be useful? Hard to say. See my earlier responses. It will allow for theory-testing, and in the hands of an agency long in marketing talent and short on statistical talent, this will lead to a lot of bad conclusions.

Data mining is hard. It is rarely done well, or with a lot of value. It is an activity you want to engage an expert with. Big data doesn’t change the problem, it just makes more data available, possibly in an easier format.

What standards – formal or de facto – are you watching most closely?

None seriously. The problem with standards is that they need the space being defined by the standard to have a strong need for sharing and interoperability. That’s not something eCommerce cares that much about, except for SEO and backend integration. I would love to see strong demand for things like OpenId and eWallets, but it’s just not there.

Learn how Ironcluster™ can load eCommerce transactions into Big Data systems like Amazon Elastic MapReduce.

Related Posts