Data volume and higher transaction velocities associated with modern applications are driving change in organizations across all industries. This has occurred for many reasons. Customer and end-user expectations for interacting with computerized systems have changed—and technology changes to accommodate these requirements. Furthermore, larger and larger amounts of data are generated and made available both internally and externally to businesses. Therefore, the desire and capability to store large amounts of data continues to expand.
One clear goal of most organizations is to harness this data—regardless of source or size—and to glean actionable insight. This is known as analytics. Advanced analytical capabilities can be used to drive a wide range of applications, from operational applications such as fraud detection to strategic analysis such as predicting patient outcomes. Regardless of the applications, advanced analytics provide intelligence in the form of predictions, descriptions, scores, and profiles that help organizations better understand behaviors and trends.
Moreover, the desire to move up the time-to-value for analytics projects will result in a move to more real-time event processing. Many use cases can benefit from early detection and response, meaning that identification needs to be as close to real time as possible. By analyzing reams of data and uncovering patterns, intelligent algorithms can make reasonably solid predictions about what will occur in the future. This requires being adept enough to uncover the patterns before changes occur. This does not always have to happen in real time.
Issues in Deploying Advanced Analytics
When implementing an analytics project, it is not uncommon to encounter problems. One of the first issues that needs addressing when adopting analytics in the cognitive era, is to have organization leaders who will embrace making decisions based on data—instead of gut feelings based on the illusion of having data.
Things change so fast these days. It is impossible for humans to keep up with them. Cognitive computing applications that rely on analytics can ingest and understand vast amounts of data and keep up with the myriad of changes occurring daily—if not hourly. Armed with advice that is based on a thorough analysis of up-to-date data, executives can make informed decisions rather than the guesses they are making today.
However, most managers make decisions based on their experience and intuition without necessarily having all the facts. When analytics-based decision making is deployed, management can feel less involved and might balk. Without buy-in at the executive level, analytics projects can be costly without delivering an ROI because the output, which would deliver the ROI, is ignored.
Another potential difficulty involves managing and utilizing large volumes of data. Businesses today are gathering and storing more data than ever before. New data is created during customer transactions and to support product development, marketing, and inventory. Many times, additional data is purchased to augment existing business data. This explosion in the amount of data being stored is one of the driving forces behind analytics. The more data processed and analyzed, the better advanced analysis will be at finding useful patterns and predicting future behavior.
Even so, as data complexity and volumes grow, so does the cost of building analytic models. Before real modeling can happen, organizations with large data volumes face the major challenge of getting their data into a form from which they can extract real business information. One of the most time-consuming steps of analytic development is preparing the data. In many cases, data is extracted and a subset of this data is used to create the analytic data set where these subsets are joined together, merged, aggregated, and transformed. In general, more data is better for advanced analytics.
There are two aspects to “more data”:
- Data can increase in depth with more customers, transactions, etc.
- Data can grow in width when subject areas are added to enhance the analytic model
At any rate, as the amount of data expands, the analytical modeling process can elongate. Clearly, performance can be an issue.
Real-time analytics are another interesting issue to consider. The adjective “real-time” refers to a level of responsiveness that is immediate or nearly immediate. Market forces, customer requirements, technology changes, and governmental regulations collectively conspire to ensure that out-of-date data is unacceptable. As a result, today’s leading organizations are constantly working to improve operations with access to and analysis of real-time data.
Nimble organizations need to assess and respond to events in real time based on up-to-date and accurate information, rules, and analyses. Real-time analytics is the use of, or the capacity to use, all available enterprise data and resources when they are needed. If at the moment information is created in operational systems it is sensed and acted upon by an analytical process, real-time analytics have transpired.
As good as real-time analytics sounds, it is not without its challenges to implement. One such challenge is reducing the latency between data creation and when it is recognized by analytics processes.
Time-to-market issues can be another potential pitfall of an advanced analytics project. A large part of any analytical process is the work involved with gathering, cleansing, and manipulating data required as input to the final model or analysis. As much as 60 percent to 80 percent of personnel effort during a project goes toward these steps. This up-front work is essential to the overall success of any advanced analytics project.
From a technology perspective, managing the boatload of data and the performance of operations against that data can be an issue. Larger organizations typically rely on a mainframe computing environment to process their workload. But even in these cases the mainframe is not the only computing platform in use. And the desire to offload analytics to other platforms is often strong. However, for most mainframe users most of the data resides on the mainframe. If analytics is performed on another platform, moving large amounts of data to and from the mainframe can become a bottleneck. Good practices and good software are needed to ensure that efficient and effective data movement is in place.
But before investing in a lot of data movement off the mainframe, consider evaluating the cost of keeping the data where it is and moving the processes to it (the analytics) versus the cost of moving the data to the process. Usually, the former will be more cost effective.
Taking advantage of more in-memory processes can also be an effective approach for managing analytical tasks. Technologies like Spark, which make greater use of memory to store and process data, are gaining in popularity. Of course, there are other in-memory technologies worth pursuing as well.
Another technology that is becoming more popular for analytics is streaming data software. Streaming involves the ingestion of data—structured or unstructured—from arbitrary sources and the processing of it without necessarily persisting it. This is contrary to our common methodology of storing all data on disk.
A stream computing application gets quite complex. Continuous applications composed of individual operators are interconnected and operate on multiple data streams. For example, in healthcare multiple streams exist such as blood pressure, heart rate, and temperature, from multiple patients with multiple diagnoses.
The Bottom Line
Many new and intriguing possibilities exist for analytics that require an investment in learning and new technology. Even so, the return on investment is potentially quite large through insight into the business, which results in better service to customers. After all, that is the raison d’être for the business.
Craig Mullins is President & Principal Consultant of Mullins Consulting, Inc., and the publisher/editor of The Database Site. Craig also writes for many popular IT and database journals and web sites, and is a frequent speaker on database issues at IT conferences. He has been named by IBM as a Gold Consultant and an Information Champion. He was recently named one of the Top 200 Thought Leaders in Big Data & Analytics by AnalyticsWeek magazine.