Wednesday, 4 June 2014

Introduction to Mark Logic



Mark logic is the new generation database for Big Data. Mark logic provides the trusted platform for Big Data Application and helps to search crucial information from large data as available in disconnected and non-relational way and provides the way to convert it in useful data in fast way to generate revenue. Mark Logic is based on NoSql Database technology.

                Mark Logic is based on XMLs for saving data contents. So, it will help if you have some idea of XML and related technologies. For ex – Xpath, XQuery, XSLT etc. Mark Logic is very powerful tool to search large data contents and process them in very fast way to get meaningful results for analysis and decision making etc.

                The best way to understand Mark Logic is to download its developer version and start playing it for more details. But for that you might need to know some of the basic theory about Mark Logic Database. 


So today we are going to discuss very basic level theory about terms used in Mark Logic as below.
      1.       Hosts
      2.       Database
      3.       Forest
      4.       App Servers
      5.       Modules
 

Hosts:-

A host is an instance of Mark Logic server running on a single machine. Sometime the machine installed with instance of Mark Logic also pronounced as Host. Host is always a part of a group that means a host can’t be created and configured individually. By default a host is added to default group.
Now you must be thinking that what is group? But as of now just start with that every instance of Mark Logic has a default group named as “Default”. I will cover Groups and clusters in the advance topics in Mark Logic in my later blogs. But initially we can start with default group and cluster as created as default with Mark Logic instance. 

Database:-

In Mark Logic, Database is a layer which actually doesn’t stores contents directly. Database serves as a layer of abstraction between forests and servers (HTTP, XDBC, WebDav) too access contents as saved in Mark Logic forests. A database is consists of single or multiple forests which are configured on host and forests are actually containing data which is saved in Mark Logic database. Mark Logic database provides a single point of access and contiguous set of contents to connect, query or operate on data as saved in multiple forests.

Mark Logic is installed with following supporting databases as default.

      a)      Documents – This database contains default properties and size etc. information of documents as in Mark Logic
      
      b)      Last Login – This database contains and tracks last login information in server and other accessibility to database
      
      c)       Schemas – This database contains schema information of every database. Each data base is connected to Schema database as default to save schema information however it could be saved in same database as well but it is recommended to keep it in default schema database.
     
      d)      Security – This database contains security related configuration information of every database. Every database is connected to security database to save security information and is recommended to save security information in default security database.
      
      e)      Modules – This database is used to store executable XQuery code. This database is created by   default during Mark Logic installation which we can use to keep our executable XQuery but we can also save XQuery in other database but that data base should be used as module database in HTTP or XDBC server configuration.
If we are using Modules database to keep XQuery files than each XQuery file must be prefixed with root url (as configured in HTTP or XDBC server as root ) to access XQuery file as saved in associated Modules database.
For example, if you are using a modules database and specify a root in an HTTP or XDBC server of http://marklogic.com/, the following documents are executable from that server: http://marklogic.com/default.xqy          
http://marklogic.com/myXQueryFiles/search_db.xqy
     
      f)       Triggers - Trigger database is used to store triggers. Triggers are nothing but some executables to process contents. Default trigger database is created during installation to store triggers however separate database can also be used as trigger database just like as Modules database.

Forests:-

Forests are the actual storage of contents. Forest contains data in the form of XML, text or binary documents. Forests are created on hosts and attached to a database. One database can be attached to multiple forests but one forest is attached to only one database at a time. Multiple forests attached to a database appears as a contiguous set of content for database for query purpose. However individual forest (not attached with any database) is of no use. No data can be loaded/saved in a forest which is not attached with any database.
A Forest contains in memory and on disk structure is called as stand.

App Servers:-

App servers are accessible and created/managed at group level in Mark Logic Server. Each App server could be associated with one database and configured to single port for communication. App servers are actually used to access data as saved in Mark Logic database forests from applications.    

Applications communicates with these app servers to fetch or insert/search/delete documents. There are following App Servers available in Mark Logic which has their own specific purpose and limitation.
       
       a)      HTTP Server
       b)      XDBC Server
       c)       WebDav Server
       d)      ODBC Server

HTTP Server: -

HTTP Servers in Mark Logic enables to create XQuery based web application. Using HTTP server we can execute an XQuery from web application against a database to fetch and process data in documents. HTTP Server enables us to return XHTML or XML contents to browser or other HTTP enabled client applications.
HTTP Servers are defined at group level and accessible to all hosts in a group. HTTP server provides access to set of XQuery programs which are saved in specific directory structure. HTTP servers are connected to a database on specific port and executes all respective XQuery executables against associated database using HTTP server.
HTTP server can execute XQuery code either from a specified location in file directory or from Modules database.
 Click Here to see procedures to create and manage HTTP server.

XDBC Server: -

XDBC App Servers are used for XML Contentbase Connector (XCC) applications to communicate with Mark Logic Server. XCC is an API which is used by Java and .NET to communicate with Mark Logic server. XDBC server are defined at group level and accessible to all hosts in a group. XDBC server provides access to a specific forest and to root to access set of XQuery programs that resides with in a specific directory structure.
These XDBC servers are used to insert/fetch/delete documents from Mark Logic using .Net or Java application. XDBC servers also used to access XQuery programs or library within query console of Mark Logic server.  
XDBC server provides access to a specific database/forest but using XCC connector we can communicate to any database of host with in a cluster.
Click Here to see procedures to create and manage XDBC server.


WebDav Server: -

WebDav servers are used to access database documents and programs directly in file system using WebDav client. It allows to read/write/delete documents directly from database on the basis of configured security settings. WebDavs are needed when we need to store and access our XQuery base programs in a database using specific directory in that database.
WebDav servers in Mark Logic are similar to HTTP servers, but has the following important differences-
i)                    WebDAV servers cannot execute XQuery code.
ii)                   WebDAV servers support the WebDAV protocol to allow WebDAV clients to have read and write access (depending on the security configuration) to a database.
iii)                 A WebDAV server only accesses documents and directories in a database; it does not access the file system directly.

WebDAV (Web-based Distributed Authoring and Versioning) is a protocol that extends the HTTP protocol to provide the ability to write documents through these HTTP extensions. You need a WebDAV client to write documents, but you can still read them through HTTP (through a web browser, for example).
Click Here for information on creating and configuring a WebDav Server,

ODBC Server: -

ODBC server is used to allow SQL client to communicate with Mark Logic server for database operations using SQL statements. ODBC is one of several component in Mark Logic that supports SQL queries. Basic purpose of ODBC server is to return relational style data as in Mark Logic, in response of SQL queries. The ODBC server returns data in tuple form and manages server state to support a subset of SQL and ODBC statements. DBC servers are created and managed at Group level and ODBC server associated with a specific database.
Click Here for information on creating and configuring a WebDav Server,

Modules:-

Set of XQuery base programs or executables are called as modules which are saved with .XQY extension. These modules are nothing but set of XQuery statements to fetch or process data as saved in Mark Logic. But the XQuery program will be executed on which database this is configured using Modules setting in App Servers. If App server is configured with file system in Modules setting then XQuery programs are stored in that specific directory as specified in root of App Server. If Module setting in App server is configured to some database (for ex Modules database as created default for same purpose) and we want to store our XQuery programs in that database in that case we need to create WebDav server for the configured database (i.e. Modules) so that we can access directory structure of database and could store our program in specific directory and can access with root URL by prefixing in XQY file  location where root URL of App server should be top level directory URL.

So friends, I think we talked enough theory to start playing with Mark Logic Server using these basic theory concepts. In next blog we will go for practical implementation of these. They might need separate blog for each but you can explore it more at your own as well using Mark Logic Admin guide.

Reference


 

No comments:

Post a Comment