Monday, 28 November 2016

Transactions in Mark Logic

Hello Friends, I am back with another topic in Mark Logic. But this time we are going to discuss conceptual/theoretical details because it is very important and deserves a discussion before jumping to any practical implementation/discussion.

Transaction is a very important feature of any database which ensures data integrity and responsible to maintain accuracy and consistency of data. 

Mark Logic also supports Transactions like other database systems but terms used may be bit different just like others because each database system has its own terms. Below are the terminologies used in Mark Logic for Transactions.

1.Statements:- Statements are piece of xQuery that communicates with data as saved in Mark Logic database. Statements can be read/query statements or modify/update statements. 
2.Query Statements:-Query Statements are the statements which are used to just fetch data from Mark Logic database and does not includes update call. So, after Query statement execution there is no change in state of data in database. Query statements have read only view of database and therefore no lock or lightweight locks are applied during execution of Query Statement to enhance the performance.
3.Update Statements :- Update Statements are the statement that performs or may perform modification in state of data or data itself. A statement can be categorised as update statement whether it is performing or not performing update at run time. Update statements applies reader/writer locks as or when needed during execution.
4.Transaction :- A set of one or more statements which either succeeds all or fails all. Transaction may be an update or query transaction on the basis of statements used and/or transaction mode used. Apart from their nature Transaction can be single statement or multiple statement transaction on the basis of number of statement involved.
5.Query Transaction:- Query transactions are the transactions which makes no modification in data and applies no locks on data. Query transaction may also be single statement or multiple statement on the basis of applied transaction mode.
6.Update Transaction:- Update transactions are the transactions which included update statements and applies readers/writers lock on data. Just like query transaction, update transaction may also contain single or multiple statement.
7.Single-Statement Transaction:- Single statement transactions are the transactions which are automatically committed after successful execution of each statement or rolled back on error. A transaction created with default transaction mode (i.e. auto) is always a single statement transaction.
8.Multi-Statement Transaction:- A transaction is multi-statement transaction if consists of one or more statements which commits or rollbacks together. Multi-statement transactions can be created with “query”or “update”transaction mode. Multiple statement transactions must be committed explicitly using xdmp:commit. In multiple statement transaction, changes made by previous statement are only available for next statements within the same transaction until xdmp:commit called/executed.
9.Transaction Mode :- Transaction mode in Mark Logic is used to specify if specified transaction need to be considered as Query or Update transaction. Also, transaction mode help to specify commit strategy. Mark logic supports three possible values for transaction mode (I.e. auto, query, update). Transaction created with “auto”mode are single statement transactions (as explained above) and automatically commits/rollbacks on success/error respectively.
10.Commit:- Commit is the state where transaction ends and makes all changes available for database as made by statements in transactions. Xdmp:commit is used to commit transaction explicitly.
11.Rollback:- This terminates the transaction and discards all the changes made by statements in transaction. On error all transactions are rolled back automatically. Transaction is also rolled back if timeout occurs before reaching to xdmp:commit.
12.Readers/writers lock:- A set of locks applies on documents for read or write during access as per the transaction. For example update transaction always looks latest version of document and locks document for any update during access. Once a document is locked, any other update statement will wait for lock to be released before updating.

As specified, above are the various terms and their definitions which are used with the concept of transaction but another important point is how to control transactions as per choice or requirements. I will try to quickly explain about how to control transaction as per choice.

As we discussed that transaction mode of statement is automatically detected as per static analysis of statements when we are not specifying transaction mode explicitly. So, in that case there is many time condition arises when a document is locked with no reason due to update transaction mode defined automatically while actually doing just reading. Because this lock was applied due to update transaction mode selected automatically, this lock will be maintained till transaction completed. Now assume a case if another transaction just trying to update that document which is locked with readers lock unknowingly that will stop other transaction to update that doc and the transaction need to wait unnecessarily till the completion of first transaction, which has locked the document. And this can result in performance issue. So in this case better option is to read document with query mode transaction which applies no lock and document is available for updates for further transactions.
Now lets consider this example in real time scenario. Consider there is document D1, which has frequently used information for a long running process. Also D1 contains statistic of progress of process completion in percentage which need to be updated every time a piece of task is completed. There is a transaction T1 which is reading required information from D1 and at the same time another Transaction T2 is updating progress statistic in D1. In this scenario if T1 initiated with wrong transaction mode which is “update” then D1 will be locked with readers lock. Now if in transaction T1 update statement occurred than readers lock on D1 will convert to readers/writers lock and this lock will be maintained until T1 is completed or committed explicitly. And at the same time T2 is trying to update statistic in D1 which need to wait until T1 completed. So, this will slowdown the process and also stop T2 to proceed further to start further processes.

In most of the cases default transaction mode (I.e auto) is used by the application and all the transactions are single statement transactions which are committed automatically with every statement execution completed. But if you need multiple statement transaction in any case then you need to specify transaction mode explicitly either as query or update. Below are the methods to explicitly set the transaction mode.

1.Declare xdmp:transaction-mode option in prolog (in top) of your program.
2.Call xdmp:set-transaction-mode prior to creating transaction that should run in that mode.
3.Set the transaction-mode option in the options node passed to xdmp:eval, xdmp:invoke, or xdmp:spawn.

Changing transaction mode in middle of current transaction does not affects current transaction.
Below are the some option to execute some statements with different transaction mode other than default selected transaction mode.
1.xdmp:eval (Refer https://docs.marklogic.com/xdmp:eval)
2.xdmp:invoke

Above options allows to create transaction in different session and so allows to run statements with different transaction mode other than default or current statement transaction. Script or statements executed by eval/invoke, can be handled to execute in separate or same session or transaction mode as in calling statements. This is achieved by isolation option to xdmp:eval/invoke. Below are the allowed isolation options.

1.same-statement :- this option allows to run statement executed by xdmp:eval/invoke in same session and with same transaction mode as is applied for calling statement. Any update done with this isolation using xdmp:eval/invoke is not available for subsequent statements of calling statements but if statements are in multiple statements then updates are available for subsequent statement of calling statement.
2.different-transaction :- This isolation is used to create separate session for execution of statements in xdmp:eval/invoke and hence allows to execute statements with different transaction mode other than calling statement.

Okay friends, Now keep exploring more and proceed with the technical implementation of transactions and share your views and findings.

References
https://docs.marklogic.com/guide/app-dev/transactions

Wednesday, 31 August 2016

Rest APIs

Hello Friends,

I am back with another stuff from Mark Logic which is known as REST-APIs in Mark Logic. REST-APIs term is used for restful web services. 

RESTful Web Services are actually REST architecture based web services where REST stands for representational state transfer and runs on HTTP protocol. RESTful web services are light weight, highly scalable and maintainable and are commonly used to create API/Web services for web application.

Mark logic also supports rest apis to expose communication with Mark Logic database to web applications. Mark Logic provides many in-build Rest Apis which can be used to achieve common functionality like create, update, delete and search etc. We can use these Mark Logic provided REST apis by creating new instance than we can customize it further as per our requirement but we can say that basic features are ready to use for specific database. 

In terms of Mark Logic we can say that REST APIs are specifically configured http servers that provides access to Database contents for document creation, updation, deletion and searching etc.
Already installed/configured rest-apis can be viewed using below URL


Most frequently used service is /documents which is used for document manipulation including creating, updating, deleting of documents and metadata. 

Now let come to some practical examples. We need curl command tool to call the Mark Logic rest apis for various setup/configuration.

Below is the curl command to create an instance of Rest-Api which includes creation on database, modules and configuration of rest-api to use.

curl --anyauth -u admin:admin -X POST --header "Content-Type:application/json" -d "{\"rest-api\": { \"name\":\"testApi\", \"port\":\"8011\", \"database\":\"testApi-DB\", \"modules-database\":\"testApi-Modules\" } }" http://localhost:8002/LATEST/rest-apis

Above command should create following that you can check in Mark Logic admin panel
  1. A new database as “testApi-DB”
  2. A module database as “testApi-Modules”
  3. A http App server at port 8011 with name as “testApi”

You can verify your rest api using below URL as well.

http://localhost:8002/LATEST/rest-apis

Now once the instance of rest-api is created, it is ready to be used to operate on database to create, update, delete documents  using /documents service of created rest-api.  

Below are example for various uses of documents service  

Get document using URI  
We can get the document contents using it’s URI. 
For example 
If an document exist in testApi-DB with URI as “/content/dbreg/demographics/1002970.xml” Then below URL should give you the contents of document.
http://localhost:8011/LATEST/documents?uri=/content/dbreg/demographics/1002970.xml

Create new document
Below is the curl command to create a new document using rest-api instance with /documents service
curl --anyauth --user admin:admin -X PUT -d "<root><name>John</name><age>31</age></root>" http://localhost:8011/LATEST/documents?uri=/samples/sample1.xml

Update existing document
Below is the curl command to update existing document.
curl --anyauth --user admin:admin -X PUT -d "<root><name>John Smith</name><age>31</age></root>" http://localhost:8011/LATEST/documents?uri=/samples/sample1.xml

Delete existing document
Below is the curl command to delete existing document
curl --anyauth --user admin:admin -X DELETE http://localhost:8011/LATEST/documents?uri=/samples/sample2.xml

So, This is enough for today. Go ahead and explore more for further uses of Mark Logic rest-api services like /search etc. 

Till than keep exploring and keep sharing.

References
 (Refer above URL to find rest-apis provided by Mark Logic)

Monday, 30 May 2016

Reading Content From Text Files

Hi Friends,

Recently I got a task to fetch some data from Mark Logic what I am usually doing but this time it was to fetch records from text files (.txt) which are saved in Mark Logic. 

I was trying to read content of text file, at that point of time, I came to know that dealing with text/flat files stored in Mark Logic is not as simple as XML documents. 

During saving content in text file if we are not explicitly indicating contents as text than it would be saved in binary file format and in that case to read contents of that file is not as simple as XML document.

If we are saving text content in text file from query than it should be wrapped in text content specifier i.e. “text {<content>}” to specify that contents are just text.

In my case data was already saved in Mark Logic which was not specified as text format during saving but in anyways I had to access that content. I explored and found some ways to read contents from text files as expected but that takes some time. But anyways it helped me to read contents from text files.

Following are the possible ways that I found to read content from text files stored in Mark Logic.

  1. Using fn:doc (when contents saved as text by indicating explicitly through text {})
  2. Using xdmp:binary-decode
  3. Using xdmp:document-filter


1. fn:doc
This is the simplest way of reading content of a text file saved in Mark Logic but limitation with this is that It can read content from these files only which are saved with explicit indication about text content which is usually done by wrapping content, to write in file, in text {} block. Without specifying text content specifier contents are saved in text file as binary file which is not readable through fn:doc directly.

For example. 

xdmp:document-insert(“/data/sample.txt”,”Hello World”)

Above line of code will create sample.txt file as binary file with “Hello World” content written in it. But you can not read these content directly by below code.

fn:doc(“/data/sample.txt”)

But if we save this file with below code with text content specifier then content of it would be readable through above line of code

xdmp:document-insert(“/data/sample.txt”,text {”Hello World”})

2. xdmp:binary-decode
Xdmp:binary-decode is another way of reading content from text files. This helps in reading contents from binary text files means such text files which are not written as text using text content specifier (text{}). Such files can be read by xdmp:binary-decode method as below.

xdmp:binary-decode(fn:doc("/data/sample.txt")/node(),"sjis") 

Where “sjis” is encoding used to read text files content from Mark Logic

This method can read contents from binary text files but it takes time.

3. xdmp:document-filter
Xdmp:document-filter is also a way to read contents from binary text files. It reads content from binary files and represents in xhtml format with additional meta data information. Below is the code snippet which can be used to read content from binary text files using xdmp:document-filter.

xdmp:document-filter(doc("/data/sample.txt"))

But this method also takes some time to decode binary text file and represent in xhtml format.

Thus as per above discussion I can see the optimum way is to use fn:doc method but that can be used only if text saved using text content specifier while saving in mark logic.

So if saving of text content in Mark Logic is in our control than it is better to save text content with text content specifier because this will open some other features as well that can be used for text files as well like content search. Content search can not be applied on text files which are saved as binary files that means without using text content specifier. So that’s the tip for newbies in Mark Logic development field.

Alright Guys, enough for today. Keep exploring and sharing you thoughts and knowledge. 

See you soon :)

Tuesday, 15 March 2016

Configuration Manager

Hello Friends,

Recently I explored a little about Configuration Manager and initially thought that why we need configuration manage while having admin console.

But in recent project development got the requirement of configuration manager specifically and saw that yes it is helpful and have specific requirement.

In one of my project the development team is big and where we have separate roles for database configuration and creation etc. But as we already discussed that when we are working on Mark Logic there are countless situations when you need to access database configuration but every user can not be granted with access to admin console to avoid any mistakenly or intentionally done negative activity.

At that point of time Configuration manager comes into existence which facilitates database administrator to grant access to developer to view database configuration in read only mode or with the specific restrictions.

Mark Logic configuration manager can be access through 8002 port by default at below URL.

http://localhost:8002/nav/?type=databases

Configuration manager provides the facility to access databases, servers, hosts, forests and groups in read only mode however provides a link to specific object management page on admin console as well (if allowed).

Also it allows to view details of entire configuration and linking of database objects together in view only mode.

The most important and usable option facility provided through configuration manager is import/export option which allows to export entire setting and configuration which can be import in another mark logic server instance on requirement basis.

This allows you to create same environment in your local machine or configure dev to stage environment etc.

Definitely there would be some other benefits of configuration manager but I recently come through with above discussed.

Please do share your findings as well about configuration manager so that it will be helpful to get all required details at same place.

Till the time... Keep exploring and keep sharing :)

Tuesday, 8 March 2016

Forest Level Query Execution

Hello Friends,

Today, I am back not to discuss a specific topic of Mark Logic, but to discuss a scenario that I recently faced and the only option we found to resolve that.

  Recently I faced a problem of duplicate URI in different forests of same database which creates problem in executing fn:doc() operation with uri which is found duplicate. So I am sharing my observation here for this problem with the thought that this might help someone.

Problem :- There was below warning on database status page for one of our database.
XDMP-FORESTERR:- Error  in rebalance of forest [forest-1]: XDMP-DBDUPURI [URI] found in forests [forest-1] and [forest-2]

Reason:- After lots of tracing and try we concluded that this problem may occurs when a forest of a database already contains a specific uri and later we have attached another forest to database which contains same uri in this forest as well where data may be different probably.


Resolution:- Now we see only one resolution to fix this problem that we should process all URI and check if this URI is problematic or duplicate than keep a backup of data and delete data of duplicate URI from one of the forest on the basis of some logic obviously to decide that which uri data should be kept.
 

In this case there was requirement of executing fn:doc() for problematic uri to get data and save in backup location but on execution on database for duplicate uri we got duplicate uri exception.

So, here forest level query did the magic. Forest level query execution option provided us the facility to run query on specific forest to get data to keep in backup location. And similarly during delete operation as well we needed to execute delete query on forest level only.
 

xdmp:eval helped to achieve this forest level execution of query through available options of xdmp:eval.

Below is the snippet of query that allows execution of query on specific forest.


xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml"; 

   let $uri:= "/data/test/abc.xml"
   let $forest-name := "forest-1"
   let $query :=  'declare variable $URI as xs:string external;
                 let $doc := fn:doc($URI)
                 return
                 $doc'
   let $options := <options     xmlns="xdmp:eval">
  <database>{xdmp:forest($forest-name)}</database>
                  </options>
   let $results := xdmp:eval($query,(xs:QName("URI"),$uri),$options)
    return $results

As in the above code snippet, we are executing fn:doc($URI) operation on a specific forest (i.e. forest-1) using xdmp:eval query by specifying forest in database option of eval options. This allows us to execute script (ex. Script written in $query in above snippet) on specific forest.

So, using this option we were able to keep backup of duplicate uri from specific forest, also able to delete duplicate uri from specific  forest to resolve this problem.

If any of you gone through similar problem and found effective solution for this problem than please do share with me. This might help someone who is facing this problem.

So, see you soon, till the time keep exploring keep sharing :)

Monday, 30 November 2015

Information Studio Flows


Hey Friends,

Here I am back to discuss about another stuff of Mark Logic. Its Information Studio Flow.


Information Studio

Information Studio is XQuery API which has browser based interface and a part of Mark Logic application services suite. Due to browser based interface it is easy to understand and use. This API enables you to create database and to load them with content. It actually provides you tools to perform such operation to load data in Mark Logic.

Flow

Flow is one of its tool that is very easy to understand and handle and which helps to perform loading of data in mark logic with some processing/transformation of documents and data.

Flow is a content load configuration which describes the document to be loaded in database and specifies how to load them in database.

In my words Flow is that mechanism which can create a door for you to directly transfer your content/xml in Mark Logic after required transformation applied and good part is that you don’t need to be mark logic programmer to use it.

Suppose you are very new to Mark Logic but want to start by creating some application to display meaning full and good amount of data and not ready to use MLCP and other content loading mechanism. In that case Flow  could be of great help to you.

You just need create database and configure Flow to upload your contents  to database directly through it.

Let’s discuss how you can create  a flow and what are part of its configuration.

Flows can be accessed through 8000 port of Mark Logic through below URL

http://localhost:8000/appservices/ 

(please replace “localhost”ip address of server/machine installed with Mark Logic server)

Here you can see existing flows and can create a new flow. Click on “New Flow” button will navigate you to the new flow screen created with name “Untitled-[Number]” along with option to edit that name.

Flows are consist of three part configurations.


Collector

Collector configuration specifies that where to get contents to load in database and how to ingest them. This configuration specified on screen with “Collect” name. Default collector is File system directory which can be changed to other option too like drop box to upload content using browser. Collect section helps to configure about where and how to collect content and provide option for the configuration as below.

1. Configure: this is responsible to configure location to collect content to load in database. Here we can mention path of directory at server which will work as door to send content data to Mark Logic

2. Ingestion:- Ingestion settings are responsible to decide which kind of document should be loaded and how many at a time etc. This also provides option to filter documents via regular expression to avoid any useless content loading. This also provides option to repair XML while loading in Mark Logic and set a default namespace of documents. You can ignore any modification in this section if don’t need to make such modification in contents.

Transform

This section provides the option to create transformation steps for the content being load in Mark Logic. You can use following type of transformations to transform your documents.

1. Delete:- This helps to remove unwanted element/attribute/information from content documents

2. JSON Transform:- This will convert your XML document in JSON format.

3. Rename:- This transformation helps to rename element/attribute in content documents.

4. XQuery transform:- This is custom transformation where you can write xquery base logics using CPF to apply your own rules to decide document to transform and what should be the transformation.

5. Filter Document:- This transformation is actually responsible to extract metadata information from binary documents and can save that information in properties of document for easy use and filtering.

6.  Normalize Dates:- This transformation can be applied to keep normalize date formats in content documents to avoid any problem due to different date formats in different documents.
Schema Validation:- This transformation is used to validate content documents against specific predefined xml schema.

7. XSLT:- XSLT transformation can be used to apply custom XSLT stylesheet on contents in xml documents.

Load

This section is used to configure database to load with content and to define document properties in specified database.

Database can be selected in “Destination database”dropdown and Document settings is used to define URI structure of loaded documents in specified database. Permissions can also be defined for different users on destination location. Collection can be created for all loaded documents through flow to identify separately.

Finally you got a Start Loading button which will  start looking in configured directory for documents to load in database and if found than it will process them through configuration and after transformation it will move them in database.

When you start loading through this button status would be displayed in the section for each load with loaded document and process status. Here it is provided with option to unload last uploaded document in database through “Unload” button.

So, we have discussed about how to create a flow to load content but you can see that the process to trigger loading of documents is manual. If we need this process automated than we need a small additional thing.

Just create a Xquery module file with following code in it and schedule it as scheduled task in Mark Logic server to run daily/minutely (etc. as per choice).

xquery version "1.0-ml";
import module namespace info = "http://marklogic.com/appservices/infostudio"  at "/MarkLogic/appservices/infostudio/info.xqy";
let $flow-id := info:flow-id("[NAME OF YOUR FLOW]")
return
info:flow-start($flow-id)

Above code will trigger your flow to look in configured directory and load contents in database as per defined configurations in your flow.

I believe that this is quite enough for you guys to had a good start on Information Studio Flows to load content. Please keep me posted your suggestions/queries.  

Tuesday, 8 September 2015

Create and manage CRON jobs

Hello Friends,

There are lots of things in technical world which are done with more complexity while simple thing attracts me. I always think about simplest way to implement anything until it is really need to be complex with specific and justifiable reasons.

Well, here is my new article with simple thing but not very easily available as per my requirement. So sharing with you guys if could be of any help for someone.

Introduction
“The software utility Cron is a time-based job scheduler in Unix-like computer operating systems. People who set up and maintain software environments use cron to schedule jobs (commands or shell scripts) to run periodically at fixed times, dates, or intervals.”

So, in simple words Cron is a utility/software/application in unix based OS, just like task scheduler in windows based operating system. Purpose of cron job is to provide facility of automated task processing on specific time intervals. Cron jobs are widely used to setup and configure to run shell script on periodic interval which in turn triggers other applications or process to accomplish specific automated process flow.

So that is about cron jobs. Now point is how to access and manage these jobs and answer is Crontab. Crontab is just like a file which contains configuration of all cron jobs scheduled. 

The crontab is a list of commands that you want to run on a regular schedule, and also the name of the command used to manage that list. crontab stands for "cron table," because it uses the job scheduler. 

Let’s come to the practical. As we know unix operating system is command based OS. Most of the work done in Unix is done through commands. In the same way there are some commands which are useful to view and manage cron jobs.

To connect to server to access and manage cron jobs we need an application which helps to execute commands. PUTTY console is one of the option that i am using.

PuTTY is an SSH and telnet client, developed originally by Simon Tatham for the Windows platform. PuTTY is open source software that is available with source code and is developed and supported by a group of volunteers.

You can download PuTTY here

Below are the required details needed  to connect with Putty console.

Server IP - 10.xx.xx.xx
Username - admin
Password - admin

Once connected to server through putty console  we can access Crontab to view and add/edit Cron jobs through commands.  

Below are the  commands which are used to view and manage cron jobs in crontab

Crontab -l :-  This command is used to view list of all cron jobs created and scheduled.

Crontab -e :- This command is used to edit crontab to add or edit  existing cron job configuration.

Crontab -r :- This command is used to remove crontab and to delete configuration of all scheduled cron jobs.

There are few more commands available but above three are sufficient for now.

Now next step is to know about how to schedule a cron job to run on a specific interval.

When you run “crontab -e”command the crontab would be opened in edit mode but you need to press “Insert” key to make any modification in it.

Below are the key points to know while scheduling cron job.

1. Cron jobs are schedules in following fixed pattern
[MOD] [HOD] [DOM] [MON] [DOW] [COMMAND]
MOD - Minute of day - possible values : 0 - 59
HOD - Hour of day - possible values : 0 - 23
DOM - Day of month - possible values : 1 - 31
MON - month - possible values : 1 - 12
DOW - Day of week - possible values : 0 - 6 (from Sun to Sat)
COMMAND - Path of shell  script (.sh file) which need to be triggered on specified schedule   
For ex.

0 16 * * 0 /home/task.sh

Above configuration of cron job is scheduled to run shell script (task.sh) on 0th minute of 16th hour of every day of every month when day of week is Sunday.

Please note that - 
0 - indicates first instance only
* - indicates every instance

2. Press “Esc” key to exit from edit mode of crontab 
3. Then type “:wq” and hit enter key  to save changes in cron tab.

This will display that new cron tab is installing. You can verify you changes later through Crontab -l command.

So that’s all about crontab to create and manage cron jobs in short  but you can further explore topic for in-depth details.