Oracle Stream Analytics provides rich data solutions, large-scale real-time
analytics, and enriched streaming data. It also provides real–time, actionable insights on streaming data. It is a very advanced, scalable, and flexible application. This blog will help you to learn, install, and configure Oracle Stream Analytics and run one use case on real–time stream analytics.
In order to install Oracle Stream Analytics, there are couple of prerequisites which need to be working properly. It is a very easy and straightforward installation. Download the Oracle Stream Analytics here and follow the installation guide. Ensure that Java8 or more is installed. We need Oracle database or MySQL database to act as a repository database. Hadoop cluster and Kafka Cluster are also needed. Having locally-installed Spark with version 2.2.1 built for Hadoop 2.7 is very important. Be sure to check the information available on the certification matrix, too.
Next, configure the metadata store in data sources file in OSA-126.96.36.199.1/osa-base/etc/jetty-osa-datasource.xml and provide database jdbc URL, username and password. osaadmin is default admin user in Oracle Stream Analytics which will have administrative privileges. We can verify the schema by connecting to the metadata store and viewing the tables created by OSA. Then, start the Oracle Stream Analytics by executing the start-osa.sh under OSA-188.8.131.52.1/osa-base/bin directory. Oracle Stream Analytics runs on 9080 port, so we have to create an access rule to enable the port to operate outside. Now, access the OSA using Google Chrome browser by going to the hostnameoripaddress:9080/osa. The home page looks like the below image.
Edit the system settings and point your Kafka cluster by providing all the necessary details so that it will connect to Zookeeper. The run time server can be either Yarn or Spark. Storage will be HDFS location. We can create additional users under User Management.
Other important sections are Catalog and Patterns. Catalog shows Pipelines, Streams, Predictive Models, Geo Fences, Visualizations, Dashboards, Cubes and so on. In order to create a dashboard on streaming data from Kafka, we have to establish a connection to Kafka in Connections. Provide a name for the connection and choose connection type as Kafka and also provide Zookeeper URL. Verify the connection by using test connection. This will create a connection to Kafka environment.
Now, we have to create a Stream which will stream the data from topics using Kafka Connection. Create a new Stream by specifying the stream type as Kafka and use the existing Kafka connection and topic name. In the Source Details, choose the respective data format. Formats supported are JSON, CSV and AVRO. We are going to choose JSON format as our data format. We have to define the schema for the incoming stream so that data can captured respectively. We can filter out the unnecessary data from the Stream. Either Infer Shape from the existing stream or create a manual shape. Shape is nothing but a schema. Infer the shape, modify to the need, and save it.
Verify the Stream details by clicking on the newly created Stream.
After that, we have to create a new pipeline to stream the data, apply queries, and create visualizations of the streaming data. Add the stages to Stream. We can apply filters on the real-time stream and also create various visualizations using streaming data. Once we are done with visualizations, publish the pipeline to the server by clicking the publish button. If the data is too fast, events will be suspended. Try to publish until it succeeds. Check the logs on OSA for any possible errors or warnings. Configure the Pipeline settings while publishing the Pipeline.
The next step is to create a Dashboard by adding visualizations which are created based on streaming data. This is going to be real time dashboard and would be used for real time actionable business insights. We can specify refresh interval, active dashboard filters, apply CSS, and share the Dashboard link to the users outside the network.
Oracle Stream Analytics is an easily configured, flexible, and fast analytics solution. It helps in analyzing the streaming data, identifying patterns in the real time data, filtering out unwanted data, interrogating live real-time streaming data, and performing intuitive in-memory real time business analytics.
Manikanth Koora is a Senior Big Data Developer at HEXstream. He has many years of experience in various programming languages, big data, and real-time data analytics. He completed his Masters of Science in Information Technology at Southern New Hampshire University and has completed many certifications for big data and cloud technologies. Manikanth enjoys learning new technologies and watching TV shows.