Continuing from the Part1.
First of all read the Dmitri Khanine article to give you a background understanding of the Batch Loader tool.
The key to batch loading is the format of the .hda file.
The basic structure of the file is as follows:
Action = insert
dDocTitle=TEST1
dDocAuthor=system
dInDate=24/11/2011 09:21
dDocAccount=All
dSecurityGroup=Public
dDocType=ADACCT
primaryFile=/home/stellent/batch/TestDocument.txt
<<EOD>>
dDocTitle=TEST2
dDocAuthor=system
dInDate=24/11/2011 09:22
dDocAccount=All
dSecurityGroup=Public
dDocType=ADACCT
primaryFile=/home/stellent/batch/TestDocument.txt
<<EOD>>
The metadata specification of the file then repeats itself as many times per document.
You can add your own metadata fields to the file later.
The aim therefore would be to use an Oracle database SQL script to access the UCM’s underlying tables to generate a text file specifying the metadata in the above format. For this you will need the following:
- Database user with access to the UCMs underlying database schema with read privileges on the DOCMETA and REVISIONS tables.
- Access to an Oracle SQL*Plus command prompt or an equivalent tool. SQL*Plus is preferred since my example scripts have been tested using this.
Firstly, let me explain the SQL to extract the documents.
SELECT
‘dDocTitle=’||r.dDocTitle||chr(10)||
‘dDocAuthor=’||r.dDocAuthor||chr(10)||
‘dInDate=’||to_char(r.dInDate, ‘dd/mm/yyyy hh24:mi’)||chr(10)||
‘dDocAccount=’||r.dDocAccount||chr(10)||
‘dSecurityGroup=’||r.dSecurityGroup||chr(10)||
‘dDocType=’||r.dDocType||chr(10)||
‘primaryFile=/home/stellent/phil_batch/TestDocument.’||
decode(upper(r.dWebextension), ‘JPG’, ‘jpg’,
‘PDF’, ‘pdf’,
‘DOC’, ‘doc’, ‘txt’)||chr(10)||
‘<<EOD>>’
FROM docmeta d, revisions r
WHERE d.did = r.did
AND r.dreleasestate = ‘Y’
AND <SPECIFY YOUR DOCMETA HERE>;
CHR(10) is used for a carraige return.
primaryFile is set to one of four possibilities depending on the value of the original document.
This uses an Oracle DECODE statement which is the same as an IF…THEN…ELSE but in Oracle SQL speak.
Here it can be interpreted as
if dWebextension = ‘JPG’ then
primaryFile=/home/stellent/phil_batch/TestDocument.jpg
elsif dWebextension = ‘ PDF ‘ then
primaryFile=/home/stellent/phil_batch/TestDocument.pdf
elsif dWebextension = ‘ DOC ‘ then
primaryFile=/home/stellent/phil_batch/TestDocument.doc
else
primaryFile=/home/stellent/phil_batch/TestDocument.txt
endif;
I generated four small TestDocument files in each of the formats and placed them into the specified directory (/home/stellent/phil_batch). You will need to do the same and modify the destination line and the target file. Obviously, you can vary the size of your content accordingly.
(And if you have Oracle’s deduplicating feature – your database usage is going to be infinitely smaller!).
I’ve added the <SPECIFY YOUR DOCMETA HERE> clause to highlight the need to identify exactly which content you wish to export. If you don’t specify it – you’ll end up with your webassets also being output to the export file.
Save the following script as export.sql onto a local directory.
The full SQL script is below
SET PAGES 0 lines 80 feedback off
SET SERVEROUTPUT ON SIZE 1000000
SET ECHO OFF TERMOUT OFF
SET FEEDBACK OFF
SPOOL C:\sql\batch_export.hda
PROMPT Action = insert
<<INSERT YOUR SQL HERE>>
SELECT
COUNT(*)
FROM docmeta d, revisions r
WHERE d.did = r.did
AND r.dreleasestate = ‘Y’;
Insert your own carefully structured SQL into the script.
You will also probably want to change the output directory line. This is currently your C:\SQL directory.
The file is output as batch_export.hda with a count for the number of records in the file placed at the very end of the file. When the script has finished you can remove this line from the resulting .hda file. I’ve only included it as I found it useful to know how many records were in the file.
Job done. Go run with this hda file and get it loaded!
To execute the upload
- Open a command prompt.
- Change to the working /bin directory.
- Issue the following command:
IdcCommand -f ../<your directory>/batch_export.hda -u sysadmin -l ../filelisting.log -c server
Your files will be checked into the content server and you will see a message in the command window as each file is checked in.
Or
Run the batchloader application and point it at the hda file.
Further Tips
Always try this with a much smaller subset of documents. Get a proof-of-concept before ramping up the volume of content.
You can also try it with larger test documents – if you wish to simulate storage space.
You can use the option IdcService=CHECKIN_UNIVERSAL in your SQL script. This forces the batch loader to use the standard CheckIn service per document.