Variable data jobs are increasingly used in some workflows. Variable data printing jobs usually have relatively large areas of the page remaining constant or repeated over multiple pages with small areas, such as text, being changed for each page. Time savings can be made by processing the constant areas only once, especially if the constant areas are complex or large graphic objects. This is the idea behind the Harlequin VariData (HVD) feature. The RIP detects constant areas within a PDF file, retains them, and then re-uses them as necessary,
Any PDF file with pages that share raster elements and has marks that change from page to page should be accelerated by this optimization in the RIP. The RIP scans the PDF for such pages, RIPs the shared raster elements once, and then retains them for use on subsequent pages with the same raster elements.
HVD intelligently identifies graphical elements and groups of graphical elements and groups of graphical elements that are used together multiple times. In doing so it can make use of the "hint" attributes defined in ISO 16612-2 (PDF/VT). Specifically, GTS_Encapsulated and GTS_XID are used, even if the file is a baseline PDF and not PDF/VT. Inclusion of those keys in a PDF file that is being created for variable data printing will likely increase the HVD scan speed.
HVD has two modes of operation:
In HVD external mode (eHVD), fixed and variable elements are provided to a client built into the RIP skin, along with metadata defining how to reassemble these elements into final pages. HVD can cache and compose a page from any number of rasters in external mode. In addition, it can cope with imposed flats where several images and text layers are placed on top of each other. In general, external HVD is faster than internal HVD, because it can decompose the variable data job into smaller elements, which can be cached more effectively.
External HVD has two sub-types: position independent and non position independent eHVD. These are explained in eHVD elements and backgrounds. Position-independent eHVD allows that any single cached element to be used at multiple x,y offsets on the page. Its use leads to increased efficiency, particularly for certain classes of VDP jobs such as those containing multiple coupons in lots of different permutations from page to page. Position-independence is especially valuable:
When using position-independent HVD, you should note the following:
/OptimizedPDFIgnorePatternPhase is set to false, meaning the presence of a pattern in the job being scanned amends processing within the RIP if position independent HVD was enabled. The rasters and events are still output in the format expected for the value of the OptimizedPDFPositionIndependent flag, but multiple instances of the same graphic or collection of graphics at different phase offsets relative to the pixel grid are treated as different and are rendered separately.To treat these as the same and hence potentially improve processing speed, set /OptimizedPDFIgnorePatternPhase to true.
HVD external mode is the same as previously described by Global Graphics as "ERR2". Some source code files, configurations and page features still refer to ERR2.
HVD in internal mode works with all Harlequin Core raster back ends. In external mode the raster back end needs to explicitly handle the eHVD handshake (except when using a diagnostic value of OptimizedPDFCacheID).
In the "clrip" application, the raster backends that support HVD external mode are:
/OptimizedPDFCacheID value of GG_HHR_HVDNONE_ERR2 or GG_HHR_HVDNONE_SHM_ERR2. HVDNONE discards the raster data. The difference between the cache ID values is that GG_HHR_HVDNONE_ERR2 saves the cached element data in process memory framebuffers, or GG_HHR_HVDNONE_SHM_ERR2 saves the cached element data in shared memory framebuffers before discarding them. When using the Scalable RIP, GG_HHR_HVDNONE_SHM_ERR2 may be able to share some element rasters between different Farm RIPs./OptimizedPDFCacheID value of GG_HHR_HVDRAW_ERR2 or GG_HHR_HVDRAW_SHM_ERR2. HVDRAW generates a raw file for each element, named <id>.raw, and an XML file containing the page and element info, named <job>.pages.xml. The difference between the cache ID values is that GG_HHR_HVDRAW_ERR2 saves the cached element data in process memory framebuffers, or GG_HHR_HVDRAW_SHM_ERR2 saves the cached element data in shared memory framebuffers. When using the Scalable RIP, GG_HHR_HVDRAW_SHM_ERR2 may be able to share some element rasters between different Farm RIPs. For more information on the format of raw files, see the /RAW raster backend./OptimizedPDFCacheID value of GG_HHR_LIBTIFF_ERR2./OptimizedPDFCacheID value of GG_HHR_ASYNCTIFF_ERR2./OptimizedPDFCacheID values of GG_HHR_FRAMETIFF_ERR2 or GG_HHR_FRAMETIFF_SHM_ERR2. The difference between the cache ID values is that GG_HHR_FRAMETIFF_ERR2 saves the cached element data in process memory framebuffers, or GG_HHR_FRAMETIFF_SHM_ERR2 saves the cached element data in shared memory framebuffers. When using the Scalable RIP, GG_HHR_FRAMETIFF_SHM_ERR2 may be able to share some element rasters between different Farm RIPs./OptimizedPDFCacheID value of GG_HHR_HVDSCAN_ERR2. This selects a cache implementation that always indicates that elements are ready. The output function logs details of the page and element structure in the HVD file. This can be used to scan HVD files at high speed to discover what elements will be produced, and how they are positioned and repeated, before committing to a full render.The Scalable RIP provides two more HVD cache implementations that use the Scalable RIP's messaging infrastructure to manage a single cache that is shared between all participating RIPs. This has advantages over per-process and even the shared memory cache implementations built into libHVD and the HHR SDK:
There are two HVD cache implementations provided by the Scalable RIP. As with all HVD cache implementations, these are registered in RDR as named RDRs of class RDR_NAMES_LIBHVD_CACHE_API, and discovered and connected to HVD clients by event monitors in the Farm RIPs. The cache implementations provided are:
The Scalable RIP hosts an instance of the remote HVD cache in the same process as the controller RIP. This is used by some raster backends to cache shared memory framebuffers across all processes in a Scalable RIP. It can also be used to output rasters through the controller RIP, since that RIP instance is only used for job submission. If outputting rasters through the controller RIP process, you may need to increase the amount of memory used by the controller RIP. For the "clrip" application, this can be done by adding a second "-m" option after the "-nrip" command-line option.
The "libripfarm" library can host more than one instance of a remote HVD cache, using the same host and port. Different instances may provide different capabilities, such as allowing or disallowing remote output, or performing a scan-only cache that always indicates elements are present. These instances are identified by a cache ID. To allow multiple different raster backends in Farm RIPs to use the same remote cache, the Farm RIP will strip all characters up to and including an '@' character in the /OptimizedPDFCacheId used to connect to the Farm RIP before passing the cache ID to the remote cache.
In the "clrip" application, the raster backends that support Scalable RIP HVD caches:
/OptimizedPDFCacheID values of HVDNONE@RIPFARM_SHM, HVDNONE@RIPFARM_SHM_OP and HVDNONE@RIPFARM_SHM_LOG will select Scalable RIP HVD caches. All of these save the cached element data in shared memory framebuffers. HVDNONE@RIPFARM_SHM composites and then outputs the data (by discarding it) locally. HVDNONE@RIPFARM_SHM_OP remotely composites and then outputs the data (by discarding it) on the controller RIP. HVDNONE@RIPFARM_SHM_LOG remotely logs the pages and elements that would have been composited and output on the controller RIP./OptimizedPDFCacheID values of HVDRAW@RIPFARM_SHM and HVDRAW@RIPFARM_SHM_OP will select Scalable RIP HVD caches. These save the cached element data in shared memory framebuffers. HVDRAW@RIPFARM_SHM composites and then outputs the data locally. HVDRAW@RIPFARM_SHM_OP remotely composites and then outputs the data on the controller RIP. For more information on the format of raw files, see the /RAW raster backend./OptimizedPDFCacheID values of FRAMETIFF@RIPFARM_SHM and FRAMETIFF@RIPFARM_SHM_OP will select Scalable RIP HVD caches. These save the cached element data in shared memory framebuffers. FRAMETIFF@RIPFARM_SHM composites and then outputs the data locally. FRAMETIFF@RIPFARM_SHM_OP remotely composites and then outputs the data on the controller RIP./OptimizedPDFCacheID value of HVDSCAN@RIPFARM_SCAN_LOG will select a Scalable RIP HVD cache implementation that always indicates that elements are ready. The output function logs details of the page and element structure in the HVD file. This can be used to scan HVD files at high speed to discover what elements will be produced, and how they are positioned and repeated, before committing to a full render.The Scalable RIP HVD cache is split into two parts that communicate with each other the messaging framework used by the Scalable RIP.
You can specialize the HVD cache to use your own method for storing and representing element rasters. You might store element rasters as GPU textures (using a handle as a reference), in a database (using the key as a reference), or some other method. The HVD cache just requires an opaque pointer referencing the raster object, and a method of resolving a textual reference to such a pointer in each process.
The raster backend will store the element rasters itself, and add them to the HVD monitor using an opaque pointer and textual ID.
The HVD cache API needs HVD_cache_fns::raster_release() and HVD_cache_fns::raster_find() methods, and optionally HVD_cache_fns::recovery_filter() and HVD_cache_fns::raster_purge() methods that operate on or resolve opaque raster pointers. If using the hvd_output_page() function to composite and output pages, the output page params need the hvd_output_page_params::raster_description_fn(), hvd_output_page_params::element_raster_open(), hvd_output_page_params::element_raster_map(), and hvd_output_page_params::element_raster_close() methods to operate on the opaque raster pointers.
After starting the SDK but before starting the RIP, discover the "RIPFARM_BASE" HVD_cache_fns instance in RDR namespace RDR_NAMES_LIBHVD_CACHE_API. Copy it to a structure that will last as long as the SDK, and add HVD_cache_fns::raster_release() and HVD_cache_fns::raster_find() functions, optionally also HVD_cache_fns::recovery_filter() and HVD_cache_fns::raster_purge() if you have a use for them. Register this copied cache API in RDR under new name in namespace RDR_NAMES_LIBHVD_CACHE_API.
The remote cache instance needs to be able to acquire and release references to element rasters, to ensure that they do not get deleted while any RIP requires them. Before starting the Scalable RIP server loop, create one or more RF_HVD_CACHE_INSTANCE structures, containing the remote cache ID to use, and RF_HVD_CACHE_INSTANCE::raster_from_client() and RF_HVD_CACHE_INSTANCE::raster_release() methods that use the same element storage as the raster backend. The raster backend and the RF_HVD_CACHE_INSTANCE methods use the textual rasterId params to communicate whatever is necessary to sync and share element raster references. This may be the name of a shared memory object or the name of a file, a database key, or some other identifier that can be resolved to the element raster object. The opaque pointer resolved by RF_HVD_CACHE_INSTANCE::raster_from_client() will probably be different in every process, but will ensure access to the raster data. If enabling remote output in the cache, the instance structure should have RF_HVD_CACHE_INSTANCE::hvd_output_page() and RF_HVD_CACHE_INSTANCE::hvd_output_done() methods. Register these instance structures in RDR under class RDR_CLASS_API and type RDR_API_RF_HVD_CACHE. The Scalable RIP server loop will discover these instances when it starts the remote HVD cache, and will add them to the set of supported remote cache instances.
In the Farm RIP raster backend initialization, construct an HVD_monitor_params structure using the copied HVD_cache_fns cache API name in the HVD_monitor_params::cache_api_name. Set HVD_monitor_params::cache_id to a cache ID that ends with '@' and then the RF_HVD_CACHE_INSTANCE::cache_id name. Set the HVD_monitor_params::page_output_fn() method to sw_fr_hvd_output_page() if outputting remotely.
The "libripfarm" library may be linked to your DFE process to control Scalable RIP startup and job submission, or to a DBE process to control raster delivery and consumption. Any process linked with "libripfarm" may optionally host one or more remote Scalable RIP cache instances. The steps required to achieve this for a DFE/DBE process are similar to enabling a raster backend for HVD, but with a couple of additions:
port field to a TCP/IP port that the Scalable RIP does not already use, set RF_HVD_CACHE_PARAMS::n_instances to a non-zero number of cache instances to support, and set the RF_HVD_CACHE_PARAMS::instances field to an array of pointers to the instances. The instance structures should have RF_HVD_CACHE_INSTANCE::raster_from_client() and RF_HVD_CACHE_INSTANCE::raster_release() methods that use the same element storage as the raster backend. The raster backend and the RF_HVD_CACHE_INSTANCE methods use the textual rasterId params to communicate whatever is necessary to sync and share element raster references. This may be the name of a shared memory object or the name of a file, a database key, or some other identifier that can be resolved to the element raster object. Each RF_HVD_CACHE_INSTANCE has a non-NULL cache_id field If RF_IFACE_PARAMS::dfe_ports is non-NULL, set the RF_DFE_PORTS::hvdcachePort field to TRUE. If enabling remote output in the cache, the instance structure should have RF_HVD_CACHE_INSTANCE::hvd_output_page() and RF_HVD_CACHE_INSTANCE::hvd_output_done() methods.Four example page features are provided with Harlequin Core that turn on external mode optimization, all of them found in the SW/Page Features directory:
HVDInternal enables the internal HVD mode, which can be used with any raster output backend.HVDNone to be used with the HVDNONE example raster backend, discarding output data.HVDRaw to be used with the HVDRAW example backend, delivering raw raster data and metadata.HVDDemo, which can be used with any raster backend to demonstrate how pages are deconstructed by HVD.See the comments in page feature each for more detail.
The OptimizedPDFCacheID usually needs to be the appropriate string for the raster backend in use; the exception is the GGDUMB1 cache ID, which can be used with any raster backend to demonstrate the elements that the page is constructed from.
HVD external mode usually needs to use of ContoneMask for masking, when the raster backend is programmed to handle it. This shifts the color values in the output raster, so that the client composing the raster elements can detect if a pixel was touched when rendering elements or not. If the raster backend is programmed appropriately, you may want to add the following to your configuration or a Page Feature:
HVD and TrapPro are mutually exclusive. If an attempt is made to enable them both at the same time, HVD is turned off with the warning:
Likewise, RLE output and HVD are incompatible and turning both on at once disables HVD with the warning:
HVD auto mode detects when a PDF job was created by a known variable-data application, and can automatically enable HVD for these jobs. You can use /EnableOptimizedPDFScan as a tri-state parameter, setting it to /Always, /Never or /Auto while also retaining the boolean options of true and false for backwards compatibility: where /Always is the same as true and /Never is the same as false.
Both internal and external HVD optimizations can benefit from running in auto mode. The use of the auto HVD optimization may require an update to your RIP license.
An example of configuring for HVD auto mode for external HVD:
An example for internal mode:
A more complete example is included in the page feature file SW/Page Features/HVDInternal.
When in auto mode a procset called /HqnHVDParams invokes a procedure /SetFromInfoAndMetadata, which sets /EnableOptimizedPDFScan based on any of the following:
When set to auto mode and a PDF file is submitted, the RIP:
true./EnableOptimizedPDFScan had been set to /Always; if they don't, it acts as if /EnableOptimizedPDFScan had been set to /Never.The producer/creator look-up table is available in a PostScript language file called SW/Usr/HqnVariableDataCreators. This file can be edited to add extra creators or names of procedures. If extra names of procedures are added they must be defined in the /HqnHVDParams procset. This PostScript language file returns a dictionary. The keys of the dictionary are names of procedures for matching the known variable data creator strings to the value for creator or producer found in the metadata or info dictionary. Corresponding values of the dictionary are arrays of strings of known variable data creators.
The SW/Usr/HqnVariableDataCreators file is read from the /HqnHVDParams procset. Safety code is provided which produces a warning for incorrect stack handling or type of returns.
You can set the variable /HvdParamsDebug at the top of the /HqnHVDParams procset to true to view extra debug information.
/EnableOptimizedPDFScan to /Always, /Auto or true implies that the rest of the RIP configuration is appropriate for use with HVD. Meaning, for instance, that if you are using simple imposition you should set HVD to off.A large job can be split into "chunks" of data with the use of /PageRange. Here, for example, the job is split into chunks of 10 pages:
While running this PostScript language fragment in an HVD setup, if, for example, during the first page range (1 to 10) some variable data is retained for re-use but the scan is aborted during a subsequent range, the scan for variable data is aborted for the rest of the job. Thus, if you are using small chunks of data and are seeing jobs aborting the HVD scan when you think there should be re-use of data, you should increase the /OptimizedPDFScanLimitPercent value, possibly up to the maximum of 100%, in which case the HVD scan continues for the whole job.
If you are writing a PostScript language control stream that needs to execute chunks from different PDF files you should call pdfclose on the first PDF file before calling pdfexecid on a chunk from the second to ensure that HVD scanning is triggered for the second file.
Some diagnostic modes are available for to determine RIP behavior when using HVD. These modes should not be used in production, but can be useful when trying to determine why a job behaves in a particular way. There are several Harlequin-internal values for the /OptimizedPDFCacheID PDF parameter that can be used for diagnosing HVD issues when /OptimizedPDFExternal set to true:
The following example PostScript language code turns HVD on and selects a diagnostic mode to output each raster element exactly once:
For Harlequin Core, see also the supplied HVDDemo example page feature.
The Scalable RIP can be configured to use HVD. When using HVD, it is important to realize that each job is split up into chunks, and the chunks are farmed out to separate RIPs for interpretation and rendering. HVD scans the start of each job it sees to determine whether there is enough repeated content to be worthwhile caching. If not enough content is repeated, HVD disables caching for the rest of the job. In the Scalable RIP, HVD scanning and re-use is performed on each Farm RIP independently. When sending page ranges from a job to a Farm RIP, the Scalable RIP keeps the job context open on the Farm RIP if it had previously run a page range from the same Scalable RIP job.
When using HVD with the Scalable RIP, this means that:
If HVD does not detect enough content re-use within the first page range of a job, it will disable re-use not just for the first page range, but for all subsequent page ranges of the job sent to the same Farm RIP.
The default chunk size that the Scalable RIP uses to split PDF jobs is 1, which will prevent HVD from working with iHVD and non-position-independent eHVD. For some common PDF-VT job types, HVD can be turned on automatically if the job is likely to benefit, and the chunk size set to a different value (50 in the following example) by using the AutoHVDChunkSize configuration in your configuration file or a page feature:
This example utilizes the same functionality outlined in the Extensions Manual; for more information see Auto mode for HVD.
The configuration or page feature using /AutoHVDChunkSize also needs to set up the HVD cache ID, any other parameters except for /EnableOptimizedPDFScan, and any license key required for HVD.
The HVD scan limit percentage is configured using the /OptimizedPDFScanLimitPercent PDF parameter. The default value for this is 10 (i.e., up to 10% of the job will be scanned for re-use). For use with the Scalable RIP, this will be the percentage of the first page range encountered, so a much higher percentage is appropriate. To scan the entire submitted page range for re-use, this parameter should be set in the configuration or a page feature to 100%:
The chunk size can be set explicitly by using the DefaultPageChunkSize key in the global configuration file, but this has the disadvantage that it will affect all jobs (not just variable data jobs), reducing the load-balancing capability of the Scalable RIP for small nonvariable data jobs. The chunk size can be set for each job separately by setting a parameter on the internal Scalable RIP device. An alternate method of configuring the chunk size for variable data jobs separately from non-variable data jobs is to set the /SetChunkSize parameter using the Scalable RIP procset, thus adding this to your configuration:
This will set the chunk size for the current configuration to 50 pages. This configuration option can also be added in a page feature.
As well as supporting eHVD directly in some raster output backends, the Harlequin RIP core library contains a library to help integrate eHVD and support functions in the SDK to simplify enabling eHVD in raster backends.
You may also want to implement your own clients of the eHVD event API, especially if you have hardware that supports composing of multiple rasters.