The Difference Between Real Time, Near-Real Time, and Batch Processing in Big Data
When it comes to data processing, there are more ways to do it than ever. How you do it and the tools you choose depend largely on what your purposes are for processing the data in the first place. In many cases, you’re processing historical and archived data and time isn’t so critical. You can wait a few hours for your answer, and if necessary, a few days. Conversely, other processing tasks are crucial, and the answers need to be delivered within seconds to be of value. Here are the differences among real-time, near real-time, and batch processing, and when each is your best option.
What is Real-Time Processing and When Do You Need It?
When a shopper hits the ATM, the bank balance and transactional data have to be processed instantly, or so close to instantly that the customer doesn’t even notice the delay. This is just one example of real-time processing.
Real time processing requires a continual input, constant processing, and steady output of data. A great example of real-time processing is data streaming, radar systems, customer service systems, and bank ATMs, where immediate processing is crucial to make the system work properly. Spark is a great tool to use for real-time processing.
What is Near Real-Time Processing and When Do You Need It?
Near real-time processing is when speed is important, but processing time in minutes is acceptable in lieu of seconds. An example of near real-time processing is the production of operational intelligence, which is a combination of data processing and CEP. CEP is Complete Event Processing, and it involves combining data from multiple sources in order to detect patterns. CEP is useful for identifying opportunities in the data sets (such as sales leads) as well as threats (detecting an intruder in the network).
Operational intelligence, or OI, should not be confused with Operational business intelligence, or OBI, which involves the analysis of historical and archived data for strategic and planning purposes. It is not necessary to process OBI in real time or near-real time.
What is Batch Processing and When Do You Need It?
When you’re processing archived or historical data to determine consumer trends over the past few years, this isn’t urgent, and is ideal for batch processing by MapReduce.
Batch processing is even less time-sensitive than near real-time. In fact, batch processing jobs can take hours, or perhaps even days. Batch processing involves three separate processes. First, data is collected, usually over a period of time. Second, the data is processed by a separate program. Thirdly, the data is output. Examples of data entered in for analysis can include operational data, historical and archived data, data from social media, service data, etc. MapReduce is a useful tool for batch processing and analytics that doesn’t need to be real time or near real-time, because it is incredibly powerful. Examples of uses for batch processing include payroll and billing activities, which usually occur on monthly cycles, and deep analytics that are not essential for fast intelligence necessary for immediate decision making.
Syncsort can help keep your data fresh. Connect CDC continually keeps Hadoop data in sync with changes made on the mainframe, so the most current Big Data information is available in the data lake for analytics.
To learn more about real-time processing, read our white paper: Leveraging the Potential and Power of Real-Time Data