Monday, 30 November 2015

Information Studio Flows


Hey Friends,

Here I am back to discuss about another stuff of Mark Logic. Its Information Studio Flow.


Information Studio

Information Studio is XQuery API which has browser based interface and a part of Mark Logic application services suite. Due to browser based interface it is easy to understand and use. This API enables you to create database and to load them with content. It actually provides you tools to perform such operation to load data in Mark Logic.

Flow

Flow is one of its tool that is very easy to understand and handle and which helps to perform loading of data in mark logic with some processing/transformation of documents and data.

Flow is a content load configuration which describes the document to be loaded in database and specifies how to load them in database.

In my words Flow is that mechanism which can create a door for you to directly transfer your content/xml in Mark Logic after required transformation applied and good part is that you don’t need to be mark logic programmer to use it.

Suppose you are very new to Mark Logic but want to start by creating some application to display meaning full and good amount of data and not ready to use MLCP and other content loading mechanism. In that case Flow  could be of great help to you.

You just need create database and configure Flow to upload your contents  to database directly through it.

Let’s discuss how you can create  a flow and what are part of its configuration.

Flows can be accessed through 8000 port of Mark Logic through below URL

http://localhost:8000/appservices/ 

(please replace “localhost”ip address of server/machine installed with Mark Logic server)

Here you can see existing flows and can create a new flow. Click on “New Flow” button will navigate you to the new flow screen created with name “Untitled-[Number]” along with option to edit that name.

Flows are consist of three part configurations.


Collector

Collector configuration specifies that where to get contents to load in database and how to ingest them. This configuration specified on screen with “Collect” name. Default collector is File system directory which can be changed to other option too like drop box to upload content using browser. Collect section helps to configure about where and how to collect content and provide option for the configuration as below.

1. Configure: this is responsible to configure location to collect content to load in database. Here we can mention path of directory at server which will work as door to send content data to Mark Logic

2. Ingestion:- Ingestion settings are responsible to decide which kind of document should be loaded and how many at a time etc. This also provides option to filter documents via regular expression to avoid any useless content loading. This also provides option to repair XML while loading in Mark Logic and set a default namespace of documents. You can ignore any modification in this section if don’t need to make such modification in contents.

Transform

This section provides the option to create transformation steps for the content being load in Mark Logic. You can use following type of transformations to transform your documents.

1. Delete:- This helps to remove unwanted element/attribute/information from content documents

2. JSON Transform:- This will convert your XML document in JSON format.

3. Rename:- This transformation helps to rename element/attribute in content documents.

4. XQuery transform:- This is custom transformation where you can write xquery base logics using CPF to apply your own rules to decide document to transform and what should be the transformation.

5. Filter Document:- This transformation is actually responsible to extract metadata information from binary documents and can save that information in properties of document for easy use and filtering.

6.  Normalize Dates:- This transformation can be applied to keep normalize date formats in content documents to avoid any problem due to different date formats in different documents.
Schema Validation:- This transformation is used to validate content documents against specific predefined xml schema.

7. XSLT:- XSLT transformation can be used to apply custom XSLT stylesheet on contents in xml documents.

Load

This section is used to configure database to load with content and to define document properties in specified database.

Database can be selected in “Destination database”dropdown and Document settings is used to define URI structure of loaded documents in specified database. Permissions can also be defined for different users on destination location. Collection can be created for all loaded documents through flow to identify separately.

Finally you got a Start Loading button which will  start looking in configured directory for documents to load in database and if found than it will process them through configuration and after transformation it will move them in database.

When you start loading through this button status would be displayed in the section for each load with loaded document and process status. Here it is provided with option to unload last uploaded document in database through “Unload” button.

So, we have discussed about how to create a flow to load content but you can see that the process to trigger loading of documents is manual. If we need this process automated than we need a small additional thing.

Just create a Xquery module file with following code in it and schedule it as scheduled task in Mark Logic server to run daily/minutely (etc. as per choice).

xquery version "1.0-ml";
import module namespace info = "http://marklogic.com/appservices/infostudio"  at "/MarkLogic/appservices/infostudio/info.xqy";
let $flow-id := info:flow-id("[NAME OF YOUR FLOW]")
return
info:flow-start($flow-id)

Above code will trigger your flow to look in configured directory and load contents in database as per defined configurations in your flow.

I believe that this is quite enough for you guys to had a good start on Information Studio Flows to load content. Please keep me posted your suggestions/queries.  

No comments:

Post a Comment