Configure Multiple Data Sources to Denodo’s Data Virtualization Platform

In our last article, we offered an overview of Data Virtualization and shared many of the benefits it yields for organizations seeking to optimize their data operations.  In this follow up, we will demonstrate how Denodo connects to a variety of data sources and what it requires in order to do so.  We will then show how multiple data sources may be integrated using Denodo and then made visual for analytics through the use of a data visualization tool like Tableau.

Though Denodo is capable of connecting to many more data sources, and certainly at the same time, today we will focus on the following to demonstrate Denodo’s versatility.

  • Spark-SQL and Hive
  • Oracle
  • NoSQL Database (Mongo DB and HBase)
  • XML Files
  • Azure Blob Storage

Spark-SQL and Hive as a Data Source

Below are the prerequisites to connect Spark-SQL and Hive using Denodo Platform.

  • Denodo can retrieve data from Spark-SQL (a Spark Module running on Apache Hadoop YARN on Hortonworks ecosystem)
  • Spark-SQL supports reading and writing data stored in Apache Hive (if Hive dependencies can be found on the class path, Spark will load them automatically)
  • Note that the metastore.warehouse.dirproperty in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse.

Below are the steps to configure Hive and Spark-SQL as a data source to enable Data Virtualization with Denodo

Navigate to Files-> Data Source -> JDBC

Once JDBC data source is selected, we need to configure the data source to connect to the hive server (either on-premise or on the server its hosted).

In the Configuration tab, the following parameters need to be filled:

  • Name -> Hive Server (you can choose any relevant name)
  • Data Adapter -> Spark SQL 1.6 (Select from the drop box)
  • Driver Class path -> ‘spark-1.6’
  • Driver class -> org.apache.hive.jdbc.HiveDriver
  • Database URL -> jdbc:hive2://ecx-xx-xx-xxx-xxx.us-xxxxx-2.compute.amazonaws.com (public IP)
  • Transaction Isolation: Default database
  • Authentication-> fill in the login password of the Hive server
  • Once these steps are performed, click on test the connection.
  • If the test connection is successful, we can access the Hive and Spark-SQL database.

configuration parameters for Spark-SQL and Hive

The above screenshot shows the configuration parameters for Spark-SQL and Hive.

available database under Hive server which can be accessed

The above screenshot shows the available database under Hive server which can be accessed.

Oracle as a Data Source

Denodo can retrieve data from RDBMS sources such as Oracle My-SQL.  Below are the steps to configure Oracle as a data source for Denodo Data Virtualization.

Below are the steps to configure Oracle as a data source to enable Data Virtualization with Denodo

Navigate to Files-> Data Source -> JDBC

Once the JDBC data source is selected, we need to configure the data source to connect to the Oracle Database server (either on-premise or on the server where it is hosted).

In the Configuration tab, the following parameters need to be filled:

  • Name -> Oracle Server (you can choose any relevant name)
  • Data Adapter -> Oracle 11g (Select from the drop box)
  • Driver Class path -> ‘oracle-11g’
  • Driver class -> oracle.jdbc.OracleDriver
  • Database URL -> jdbc:oracle:thin:@xx.xxx.xxx.xx:1521/XE (public IP)
  • Transaction Isolation: Default database
  • Authentication-> fill in the login password for Oracle server and validate it
  • Once these steps are performed, click to test the connection.
  • If the test connection is successful, we can access the tables in the database.

configuration parameters for Oracle Databases as a data source

The above screenshot shows the configuration parameters for Oracle Databases as a data source.

NoSQL Database as a Data Source

To connect Denodo to HBase or Mongo DB, we need the the corresponding HBase Custom Wrapper or Mongo DB Custom Wrapper.

Below are the steps to configure a NoSQL database as a data source to enable Data Virtualization with Denodo

Navigate to Files-> Data Source -> Custom

Once Custom data source is selected, we need to configure the data source to connect to the Mongo DB server (either on-premise or on the server where it is hosted).

In the Configuration tab, the following parameters need to be filled:

  • Name -> MongoDB Server (you can choose any relevant name)
  • Class Name -> com.denodo.connect.mongodb.wrapper.MongoDBWrapper
  • Select Jars -> Denodo-mongodb-customwrapper-7.0 (Mongo DB) and Denodo-hdfs-customwrapper-6.0 (for HBase).
  • Authentication-> fill in the login password for Oracle server and validate it.
  • Once these steps are performed, click to test the connection.
  • If the test connection is successful, we can access the tables in the database.

how to select the data source for Mondo DB on Denodo Tool

The above screenshot shows how to select the data source for Mondo DB on Denodo Tool.

how to configure Mondo DB for Denodo with the specific wrapper

The above screenshot shows how to configure Mondo DB for Denodo with the specific wrapper.

XML File as a Data Source

To create an XML data source to access an XML document, right-click on the Server.

Below are the steps to configure an XML file as a data source to enable Data Virtualization with Denodo

In the Virtual Data Port Administration Tool navigate to: New > Data source > XML.

The following data are requested in this dialog:

  • Name: Name of the new data source.
  • Data Route: Path to the XML file that contains the required data.
  • Ignore route errors: If selected, the Server will ignore the errors occurred when accessing the file(s) to which the data source points. This option is not meant to be used when the data source reads a single file. Its main purpose is when the data source points to a collection of files and you know some of them may be missing.
  • Validation type: If selected, the structure of the input XML file will be obtained from a Schema or a DTD. If “None” is selected, Virtual Data Port will analyze the XML document to infer its schema.
  • Validate route: Path to the Schema (XSD) or DTD of the input XML document. If present, instead of obtaining the XML document from the “Data Route” to calculate the schema of the new base view, the Server will use the contents of the XSD or the DTD.
  • Validate data: If selected, Virtual Data Port will validate the input XML file every time the data source is accessed.

how to configure an Xml file as data source

The above screenshot shows how to configure an Xml file as data source.

tabular form extracted from XML file as data source

The above screenshot displays the output as tabular form extracted from XML file as data source.

Azure Blob Storage as a Data Source

To access to the Microsoft Azure Storage, we first need an Azure subscription where we can create a blob service container. We can upload the block blob under the blob storage container.

Below are the steps to configure Azure Blob Storage as a data source on the Denodo Data Virtualization Platform.

  • In the Virtual Data Port Administration Tool, select the type of data source needed depending on the type of file which you want to recover from Azure Blob storage by Navigating to New > Data Source in the contextual menu
  • In this example, a “Delimited File” data source will be used.
  • Select the “HTTP Client” option as the “Data Route” parameter.
  • Configure the “HTTP Client” data route to access the Azure Blob storage by clicking the “Configure” button.
  • HTTP Method: GET.
  • URL: Specify the URL to retrieve the files from the Azure Blob storage.
  • Configure the “HTTP Headers”.
  • In the “Authentication” tab, choose the authentication as “OAuth 2.0” in the drop-down list.
  • Specify the Client Identifier and Client secret that corresponds with Blob storage.
  • Launch the OAuth Credentials Wizard by clicking the link.

how to create an Azure Blob container

The above screenshot shows how to create an Azure Blob container.

how to configure Denodo to Azure Blob Storage

how to configure Denodo to Azure Blob Storage

The above screenshots show how to configure Denodo to Azure Blob Storage.  Make sure the data source is Delimited.

Integrating Data Sources Using Denodo Virtualization Tool

Denodo leverages the functionality of transformations (join, union, etc.) on Tables extracted from RDMBS data source.

Below are the steps to perform Join

  • In the Virtual Data Port Administration Tool, Navigate to Files-> New -> join
  • Select the source tables and drag it on the console. We can perform joins based on where, group by etc. conditions, we can filter the data, change the data type, and more.

joining of tables from different data sources

joining of tables from different data sources

The above screenshots show the joining of tables from different data sources (i.e. Oracle and Blob Storage).

Connecting the virtualization tool to the visualization tool to perform analytics

Below are the steps to make the connection between Denodo and Tableau and set up the data source.

  • Start Tableau. Under Connect, select Denodo. For a complete list of data connections, select More under To a Server. Then do the following:
    • Enter the name of the server that hosts the database.
    • Enter the name of the database.
    • Select how you want to sign in to the server. Specify whether to use Integrated Authentication or Username and Password. If the server is password protected, and you are not in a Kerberos environment, you must enter the user name and password.

Note: If you’re using a Mac, and it is not attached to the domain correctly, the Mac won’t know that Kerberos is being used in the domain, and the Authentication drop-down list won’t be available.

  • Select the Require SSL check box when connecting to an SSL server.
  • Select Sign In.
  • If Tableau can’t make the connection, verify that your credentials are correct. If you still can’t connect, your computer is having trouble locating the server. Contact your network administrator or database administrator.

The above screenshot shows the connection of Tableau with Denodo to perform analytics.

Conclusion

As organizations examine the ways in which they should expand and improve upon their data operations, data virtualization is a valuable tool to enable ease of access and integration across systems.  When coupled with data visualization using a tool like Tableau, the value of data virtualization is extended by allowing business users to perform analytics to ultimately drive stronger decision making across their enterprise.

About the Author

Sanjay Kumar

Sanjay Kumar

Sanjay Kumar is a Senior Big Data Developer at HEXstream.  He studied at the University of Massachusetts in Amherst, where he completed his Master’s Degree in Electrical and Computer Engineering.  Sanjay is well-versed in many programming languages and has completed multiple trainings in the big data domain.  In his spare time, Sanjay enjoys online gaming, hiking, and playing cricket.

Back to Blog

Related Articles

Why Analytic Goals Should Drive Data Architecture Selection

When my kids were younger, the sheer volume of Legos scattered hazardously around the house...

An Introduction to Denodo for Data Virtualization

We are living in an era in which enterprise data exists in more forms than many IT departments know...

Info Fields: Improving Data Visualization for Utility Analytics and Beyond

In most, if not all, out-of-the-box databases, the data are typically designed for application...