Tuesday, 3 March 2015

Mark Logic Content Pump (MLCP)

Hi Friends, 

Many times there is a situation where we need to export data from mark logic or need to import data directly in some directory of Mark Logic database. This mlcp utility is hugely important and usable in such scenarios, mostly when number of documents to be imported/exported is large. So today we are going to discuss about MLCP and going to look at the process to enable import/export through MLCP. However a complete documentation is available at docs.marklogic.com but i am just going to make it very simple to the point to give you a quick start.

MarkLogic Content Pump is an open-source, Java-based command-line tool (mlcp). mlcp provides the fastest way to import, export, and copy data to or from MarkLogic databases. It is designed for integration and automation in existing workflows and scripts.

The MLCP tool has two modes of operation:
Local: MLCP drives all its work on the host where it is invoked. Resources such as import data and export destination must be reachable from that host.
Distributed: MLCP distributes its workloads across the nodes in a Hadoop cluster. Resources such as import data and export destination must be reachable from the cluster, which usually means via HDFS.
So, we are just going to discuss about local mode for today.

The very first thing we need to know before using MLCP is, what are the prerequisites for using MLCP utility. As already mentioned above that MLCP is a java based command line tool so java run time environment is mandatory to have on machine where we are planning to execute MLCP tool.

1) Java Runtime Environment is a freeware and can be downloaded from http://filehippo.com/download_jre_64/57157/

2) Now download MLCP binaries from http://developer.marklogic.com/products/mlcp and unzip it.

That’s it, we don’t need to install it separately. It can be directly executed from directory. So we are going to look  import and export functionality using MLCP but before we go for it, another important point to keep in mind is that you may need to have following privilege when importing or exporting data to/from mark logic.

hadoop-user-write - This privilege is needed when you are trying to import data in marklogic. This privilege should be allow for the role which belongs to the user as we are going to use in mlcp command during import.
hadoop-user-read - This privilege is needed when you are trying to read and export data from mark logic. This privilege should be allow for the role which belongs to the user as we are going to use in mlcp command during import.

Now lets come directly to the commands as used to import/export data to/from marklogic using MLCP.

Before executing import/export command you should use your command prompt and navigate to the bin directory as extracted from MLCP binaries package as downloaded.

Suppose your MLCP bin directory is available at “c:/mlcp/bin”. So, navigate to the “C:\mlcp\bin\” directory in command prompt.

Now look at the following commands for Export/import



Export 


C:\mlcp\bin>mlcp.bat export -host 127.0.0.1 -port 9101 -username user -password **** -output_file_path     “c:\mlcp\data” -mode local

Description - This export command will export entire data of database which belongs to the XDBC port as mentioned in “-port”argument

-host :- this argument provided with IP of host on which marklogic is installed to export data.
-port  :- this argument provided with port number of XDBC server which belongs to the database from which data need to be exported.
-username :- this argument is provided with username of mark logic which belongs to the role that has privilege for “hadoop-user-read”
-password :- this argument is provided with password to be used in respect to username to login/access mark logic server.
-output_file_path :- this argument is provided with directory location where exported files should go. This should be provided in “”
-mode :- this argument is provided with mode of mlcp command (i.e. local/distributed)   

Now apart from this command that simply exports entire data of database, we may need to export specific data from database. In such cases we have few more arguments that can help us in that case.
-directory_filter :- This argument helps to export entire data of specific directory/directories of database
Ex -

C:\mlcp\bin>mlcp.bat export -host 127.0.0.1 -port 9101 -username user -password **** -output_file_path     “c:\mlcp\data” -mode local -directory_filter /datasources/entities/person/

This command will export entire data of “/datasources/entities/person/” directory only from database

We have some other options as well where we can export data of specific collection or from specific xpath.
-collection_filter  :- this provides option to export data of specific collection only
-document_selector :- this provides option to export data which is selected through specified xpath only.

You may go to the docs.marklogic.com for more details on other available options.
 

Import


C:\mlcp\bin>mlcp.bat import -host 127.0.0.q -port 9101 -username admin
 -password **** -input_file_path "C:\mlcp\data_to_import"

Description - This import command will import entire data of specified directory in database which belongs to the XDBC port as mentioned in “-port”argument

-input_file_path :- this argument is provided with directory path to import data from. Path should we enclosed with “”

Note:- Data will we imported in same directory structure as specified in “-input_file_path”. For example as specified above in command, all imported data would be imported in “c:\mlcp\data_to_import\” directory in mark logic database.

So, there may we requirement where we need to import data in specific URI so in that case we need to modify automatically generated URI during import. “-output_uri_replace” can help us in this case.
-output_uri_replace :- this argument is provided to replace part of URI that is being generated automatically during import. This argument should be enclosed in “” and internal string should be enclosed in ‘’

-output_uri_prefix :- this argument is used to add prefix in URI as being generated during import.

Ex - lets add this argument in our previous import command as below and run it

C:\mlcp\bin>mlcp.bat import -host 127.0.0.q -port 9101 -username admin
 -password **** -input_file_path "C:\mlcp\data_to_import" -output_uri_replace "
C:/mlcp/data_to_import,''" -output_uri_prefix /datasources

This command will import data from "C:\mlcp\data_to_import" to “\datasources\{data}” because generated URI will be updated by -output_uri_replace to replace “C:/mlcp/data_to_import” with “” and “/datasources” will be added as prefix so final location would be “/datasources/{data}”.

I think this is enough for today to give you some open area to explore more with MLCP

Now, Before we end the discussion i would like to conclude minimum requirements as needed before executing import/export commands
1)  JRE must be installed on the machine to execute mlcp commands
2) XDBC App server should be created and available in mark logic pointing to the database which need to be used for import/export of data
3) “hadoop-user-write”- For import command this privilege must be given to the user role which is used in “-username” argument
4) “hadoop-user-read”- For export command this privilege should be given to the user role which is used in “-username” argument
5) -mode argument must be local and directories should be accessible from local system

So keep enjoying exploring more.

References
https://docs.marklogic.com/7.0/guide/ingestion/content-pump#id_49096

XQuerrail Framework Based Application With Mark Logic



Hi Friends,

Recently i moved to another project with Mark Logic but its kind of UI application. Where entire development of UI, business logic all is developed based on a framework which is known as XQuerrail framework which means to xquery on rail. Yes friends, Xquery is code language for development in this framework. And as i just shared that it is also termed as xquery on rail so i believe that this framework is running based on rail but i am not much familiar with rail.

So initial step for me was to configure development environment to run application based on xquerrail framework. Initially i thought it would be some installer as most of the windows user thinks but no when started exploring it i found that it is a directory based framework and also supports MVC pattern so has very specific naming conventions and folder structure etc.

I found some problems initially in it but later on tried to make it little simple for configuration and removed some/extra/optional steps which was confusing, so that at least we would be able to execute application on xquerrail framework.

So, Today i am not going to discuss about how to develop application in xquerrail framework but going to share about the requirements and steps that needed to configure framework and specifically application that would be using xquerrail framework to run.     

Xquerrail framework (pronounced as ‘Squirrel’) - is an XQuery based rails framework for rapid application development

The XQuerrail framework is encapsulated in single folder that provides all the features. Below is the quick intro of directories available in framework


/base/ - It provides all the basic implementation for model, view and controllers.
/dispatchers/ - These are the relays that routes all the incoming requests and outgoing responses.
/engines/ - This folder contains all files/implementation for transformation and response rendering of views and controllers
/handlers/ - Handlers are responsible to compose the output for delivery for any public resource or non-controller requests.
/interceptors/ - Interceptors provides security, authorization and compression features
/helpers/ - Helpers are libraries with some powerful utilities that is usable with framework
/lib/ - Provides third party dependencies with xquery parsing library
/schemas/ - Contains schema definitions for all configuration files.


Now lets directly jump to environment configuration. Like other framework installation we have some prerequisites with this framework as well which are following   

 

Prerequisites


1) Mark logic should be installed on your machine
2) NodeJS [1]. npm is used as package manager for xquerrail2.framework. Currently tested with v0.10.26.
3) Gulp [4]. It is used to build xuqerrail2.framework. Currently tested with 3.6.0

Now lets create folder structure and configure application to run on xquerrail framework


Instructions to install xquerrail framework


1) Create #project_home directory (Ex - D:\Project_Home)
2) Download xquerrail2.framework in project home directory from GIT or copy directly
  a) open cmd command
  b) go to #project_home directory and
  run command - git clone https://github.com/nativelogix/xquerrail2.framework.git
  this should create xquerrail2.framework in #project_home directory

3) Now install xquerrail2.framework throught navigating to #project_home/xquerrail2.framework in cmd and then execute following command
    a) npm install
    b) gulp update-xqy (this will create dist folder with latest _framework files after comparing with /src/main/_framework/), This command is used to compile framework level changes and contains latest framework files in dist folder.
    c) copy and paste "src" folder from "/xquerrail2.framework/" to "#project_home/poc/"
    c) copy and paste "_framework" folder from "/xquerrail2.framework/dist/" to "#project_home/poc/src/main/"
    d) Rename "app-test" folder to "app" in "#project_home/poc/src/main/"

 

Instructions to configure application database (Mark Logic)

 

1) Create database/forests and attach them
2) Create app servers - XDBC and http server with database as created in above step and select “(file system)” as modules. Suppose http server port is 9300.
3) Root in http server should be "#project_home/poc/src/"
4) Update URL Rewriter setting to "/main/_framework/rewriter.xqy" in http app server



Instructions to configure application configuration


1) Open application-domain in "#project_home/poc/src/main/app/domains/" and update all instance of "app-test" to "app" .
2) Open default-controller in "#project_home/poc/src/main/app/controllers/" and update all instance of "app-test" to "app" .
3) update application from "app-test" to "app" in folder "#project_home/poc/src/main/_config/" for following files
    a) routes.xml
    b) ml-security.xml
    c) config.xml


Access application through http server configured on port (as taken as example in above 9300)

Go to browser and browse http://localhost:9300/initialize.xqy to initialize your application as configured through above steps then browse http://localhost:9300/ to access your application. Default page will appear with welcome message.

So, that’s all about installation of xquerrail framework and application configuration to run on it.

Now for how to code logic to create application with xquerrail framework using provided MVC pattern , you can continue to explore xquerrail framework and can refer references shared by me to understand various available options, configurations to create application.

I will try to share a quick view to create small application with xquerrail framework in near future. Till than keep exploring.

References
https://github.com/nativelogix/xquerrail2/wiki/Application
https://github.com/nativelogix/xquerrail2/wiki/Getting-Started