Hi Friends,
Recently I got a task to fetch some data from Mark Logic what I am usually doing but this time it was to fetch records from text files (.txt) which are saved in Mark Logic.
I was trying to read content of text file, at that point of time, I came to know that dealing with text/flat files stored in Mark Logic is not as simple as XML documents.
During saving content in text file if we are not explicitly indicating contents as text than it would be saved in binary file format and in that case to read contents of that file is not as simple as XML document.
If we are saving text content in text file from query than it should be wrapped in text content specifier i.e. “text {<content>}” to specify that contents are just text.
In my case data was already saved in Mark Logic which was not specified as text format during saving but in anyways I had to access that content. I explored and found some ways to read contents from text files as expected but that takes some time. But anyways it helped me to read contents from text files.
Following are the possible ways that I found to read content from text files stored in Mark Logic.
1. fn:doc
This is the simplest way of reading content of a text file saved in Mark Logic but limitation with this is that It can read content from these files only which are saved with explicit indication about text content which is usually done by wrapping content, to write in file, in text {} block. Without specifying text content specifier contents are saved in text file as binary file which is not readable through fn:doc directly.
For example.
xdmp:document-insert(“/data/sample.txt”,”Hello World”)
Above line of code will create sample.txt file as binary file with “Hello World” content written in it. But you can not read these content directly by below code.
fn:doc(“/data/sample.txt”)
But if we save this file with below code with text content specifier then content of it would be readable through above line of code
xdmp:document-insert(“/data/sample.txt”,text {”Hello World”})
2. xdmp:binary-decode
Xdmp:binary-decode is another way of reading content from text files. This helps in reading contents from binary text files means such text files which are not written as text using text content specifier (text{}). Such files can be read by xdmp:binary-decode method as below.
xdmp:binary-decode(fn:doc("/data/sample.txt")/node(),"sjis")
Where “sjis” is encoding used to read text files content from Mark Logic
This method can read contents from binary text files but it takes time.
3. xdmp:document-filter
Xdmp:document-filter is also a way to read contents from binary text files. It reads content from binary files and represents in xhtml format with additional meta data information. Below is the code snippet which can be used to read content from binary text files using xdmp:document-filter.
xdmp:document-filter(doc("/data/sample.txt"))
But this method also takes some time to decode binary text file and represent in xhtml format.
Thus as per above discussion I can see the optimum way is to use fn:doc method but that can be used only if text saved using text content specifier while saving in mark logic.
So if saving of text content in Mark Logic is in our control than it is better to save text content with text content specifier because this will open some other features as well that can be used for text files as well like content search. Content search can not be applied on text files which are saved as binary files that means without using text content specifier. So that’s the tip for newbies in Mark Logic development field.
Alright Guys, enough for today. Keep exploring and sharing you thoughts and knowledge.
See you soon :)
Recently I got a task to fetch some data from Mark Logic what I am usually doing but this time it was to fetch records from text files (.txt) which are saved in Mark Logic.
I was trying to read content of text file, at that point of time, I came to know that dealing with text/flat files stored in Mark Logic is not as simple as XML documents.
During saving content in text file if we are not explicitly indicating contents as text than it would be saved in binary file format and in that case to read contents of that file is not as simple as XML document.
If we are saving text content in text file from query than it should be wrapped in text content specifier i.e. “text {<content>}” to specify that contents are just text.
In my case data was already saved in Mark Logic which was not specified as text format during saving but in anyways I had to access that content. I explored and found some ways to read contents from text files as expected but that takes some time. But anyways it helped me to read contents from text files.
Following are the possible ways that I found to read content from text files stored in Mark Logic.
- Using fn:doc (when contents saved as text by indicating explicitly through text {})
- Using xdmp:binary-decode
- Using xdmp:document-filter
1. fn:doc
This is the simplest way of reading content of a text file saved in Mark Logic but limitation with this is that It can read content from these files only which are saved with explicit indication about text content which is usually done by wrapping content, to write in file, in text {} block. Without specifying text content specifier contents are saved in text file as binary file which is not readable through fn:doc directly.
For example.
xdmp:document-insert(“/data/sample.txt”,”Hello World”)
Above line of code will create sample.txt file as binary file with “Hello World” content written in it. But you can not read these content directly by below code.
fn:doc(“/data/sample.txt”)
But if we save this file with below code with text content specifier then content of it would be readable through above line of code
xdmp:document-insert(“/data/sample.txt”,text {”Hello World”})
2. xdmp:binary-decode
Xdmp:binary-decode is another way of reading content from text files. This helps in reading contents from binary text files means such text files which are not written as text using text content specifier (text{}). Such files can be read by xdmp:binary-decode method as below.
xdmp:binary-decode(fn:doc("/data/sample.txt")/node(),"sjis")
Where “sjis” is encoding used to read text files content from Mark Logic
This method can read contents from binary text files but it takes time.
3. xdmp:document-filter
Xdmp:document-filter is also a way to read contents from binary text files. It reads content from binary files and represents in xhtml format with additional meta data information. Below is the code snippet which can be used to read content from binary text files using xdmp:document-filter.
xdmp:document-filter(doc("/data/sample.txt"))
But this method also takes some time to decode binary text file and represent in xhtml format.
Thus as per above discussion I can see the optimum way is to use fn:doc method but that can be used only if text saved using text content specifier while saving in mark logic.
So if saving of text content in Mark Logic is in our control than it is better to save text content with text content specifier because this will open some other features as well that can be used for text files as well like content search. Content search can not be applied on text files which are saved as binary files that means without using text content specifier. So that’s the tip for newbies in Mark Logic development field.
Alright Guys, enough for today. Keep exploring and sharing you thoughts and knowledge.
See you soon :)