Thursday, June 30, 2016

Big Data Analytics with the WSO2 Analytics Platform

Organizations have more data than ever at their disposal. Actually deriving meaningful insights from that data—and converting knowledge into action—is easier said than done because there’s no single technology that encompasses big data analytics. There are several types of technology that work together to help organizations get the most value from their information. 

Big data analytics is the process of examining large data sets containing a variety of data types - i.e., big data - to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. Data could include Web server logs and Internet click stream data, social media content and social network activity reports, text from customer emails and survey responses, mobile-phone call detail records and machine data captured by sensors connected to the Internet of Things. 

What does WSO2 offer?


The WSO2 Analytics Platform, of course. It’s a single platform to address all analytics styles – 
  • Batch Analytics: Analysis of data at rest, running typically every hour or every day, and focused on historical dashboards and reports.
  • Real time Analytics: Analysis of event streams in real-time and detecting patterns and conditions.
  • Predictive Analytics: Using machine learning to create a mathematical model which allows predicting future behavior.
  • Interactive Analytics: Executing queries on the fly on top of data at rest. 
In a nutshell, you must collect data and feed them to the analytics platform. Then, the analytics platform will analyze the collected data using one or more of the above analytics techniques, and finally communicate the results as alerts, dashboards, notifications etc. 


Some Terminology


The WSO2 Analytics Platform processes data as events and also interacts with external systems using events. Let’s get the terminology right.  

Event =  a unit of data comprising a set of attributes
Event Stream = a sequence of events of a particular type. 
Event Stream Definition = Type or schema of events 
Event Receiver = Events are received through various transport protocols using event receivers
Event Publisher  = Events that are resulted after data analysis or even direct events are published via various transport protocols through event publishers. 

Data Collection


The WSO2 Analytics platform offers a single API to collect data and it can receive data through event receivers from almost any event source through inbuilt agents in all WSO2 products, Java agents (Thrift, Kafka, JMS), JavaScript clients (Web Sockets, REST), IoT(MQTT) and from over 100 WSO2 ESB connectors. You can even write a custom agent to collect data from your system and push it to the analytics platform. Basically, events can be received via multiple transports in JSON, XML, Map, Text, and WSO2Event formats, and the platform converts them into streams of canonical WSO2Events to be processed by the server. 


Data Analysis

Once the data is collected, the data must be analyzed through one or more of the following techniques: 

Batch Analytics 
Real-time Analytics
Interactive Analytics
Predictive Analytics

The WSO2 Analytics Platform comprises 3 individual products:

WSO2 Data Analytics Server – can perform batch, real-time and interactive analytics
WSO2 Complex Event Processor – used only for real-time analytics
WSO2 Machine Learner – used only for predictive analytics



I will not cover the technical details on how each of these techniques acts differently on the collected data in this blog post. Please check links [1] and [2] for more details on the four techniques that the WSO2 Analytics Platform uses to analyze data. 

Data Publishing



Events that have resulted after data analysis (or even data that is not analyzed) are published through various transport protocols – including but not limited to SMS, Email, HTTP, JMS, Kafka, MQTT, RDBMS, Logger, WebSockets - through event publishers. You can write extensions to support other transports as well. Data can be pushed to dashboards through WebSockets/REST or services/APIs can be invoked. The data can be pushed to the ESB which will in turn push them to legacy systems or even cloud applications. Moreover, the data can be stored in another database or be published to other systems where the systems have implemented Custom WSO2 Data Receivers. 

This blog post was merely an overview of what the WSO2 Analytics Platform offers and how it operates. The post is based on the content of a webinar [2] I did a few weeks back.  Please check out the full webinar for detailed information about the WSO2 Analytics Platform and the various applications of the different analytics styles.