4.10. Job File Format Conversion with Filters

One of the major problems that face new users to UNIX printing is when they have a printer that has a proprietary print job format such as the HP DeskJet series of printers. The solution to this problem is quite simple: generate your output in PostScript, and then use the GhostScript program to convert the GhostScript output to a format compatible with your printer.

    lp:filter=/usr/local/lib/filters/myfilter:...
    
    /tmp/myfilter:
    
    #!/bin/sh
    /usr/local/bin/gs -dSAFER -dNOPAUSE -q -sDEVICE=djet500 \
        -sOutputFile=- - && exit 0
    exit 2


This simple tutorial example suffers from some serious problems. If you accidentally send a non-PostScript file to the printer GhostScript will detect this and exit with an error message but only after trying to interpret the input file as PostScript. If the input file was a text file, this can result in literally thousands of error messages and hundreds of pages of useless output.

In order to make a more robust filter we need to meet the following minimum requirements:

  1. The file type should be determined, and only files that are PostScript should be passed to GhostScript.

  2. We may have some conversion routines that can convert files into PostScript files and then we can send them to GhostScript for raster conversion.

  3. If we cannot convert a file, then we should simply terminate the printing and cause the spooler to remove the job.



The ifhp Print Filter program is a companion to the LPRng software and does this type of operation. If you are using Linux, then you may find the RedHat Print Filters (http://www.debian.org) installed and in use on your system. The magicfilter developed by H. Peter Anvin http://www.debian.org is distributed with Debian Linux. The apsfilter by Andreas Klemm http://www.freebsd.org/~andreas/index.html is also widely used, although now most of its functionality is directly available in LPRng. Finally, the a2ps (Ascii to PostScript) converter by Akim Demaille and Miguel Santana is available from www-inf.enst.fr/~demaille/a2ps. This package provides a very nice set of facilities for massaging, mangling, bending, twisting, and being downright nasty with text or other files.

4.10.1. Simple Filter with File Format Detection

Since this is a tutorial, we will demonstrate a simple way to make your own multi-format print filter, and provide insight into how more complex filters work.

The file utility developed by Ian F. Darwin uses a database of file signatures to determine what the contents of a file are. For example:

    h4: {191} % cd /tmp
    h4: {192} % echo hi >hi
    h4: {193} % gzip -c hi >hi.gz
    h4: {194} % echo "%!PS-Adobe-3.0" >test.ps
    h4: {195} % gzip -c test.ps >test.ps.gz
    h4: {196} % file hi hi.gz test.ps test.ps.gz
    hi:        ASCII text
    hi.gz:     gzip compressed data, deflated
    test.ps:   PostScript document text conforming at level 3.0
    test.ps.gz: gzip compressed data, deflated
    h4: {197} % file - <test.ps
    standard input: PostScript document text conforming at level 3.0


If we are given a file, we can now use file to recognize the file type and if the file type is suitable for our printer we can send it to the printer, otherwise we can reject it. The following is a simple yet very powerful shell script that does this.

    #!/bin/sh
    # set up converters
    gs="/usr/local/bin/gs -dSAFER -dNOPAUSE -q -sDEVICE=djet500 \
        -sOutputFile=/dev/fd/3 - 3>&1 1>&2"
    a2ps="/usr/local/bin/a2ps -q -B -1 -M Letter --borders=no -o-"
    decompress=""
    # get the file type
    type=`file - | tr A-Z a-z | sed -e 's/  */_/g'`;
    echo TYPE $type >&2
    case "$type" in
      *gzip_compressed* ) decompress="gunzip -c |" compressed="compressed" ;;
    esac
    
    # we need to rewind the file
    perl -e "seek STDIN, 0, 0;"
    
    if test "X$decompress" != "X" ; then
        type=`$decompress head | file - | tr A-Z a-z | sed -e 's/  */_/g'`;
        echo COMPRESSED TYPE $type >&2
        # we need to rewind the file
        perl -e "seek STDIN, 0, 0;"
    fi
    case "$type" in
      *postscript* ) process="$gs" ;;
      *text* )       process="$a2ps | $gs" ;;
      * )
        echo "Cannot print type $compressed '$type'" >&2
        # exit with JREMOVE status
        exit 3
        ;;
    esac
    # in real life, replace 'echo' with 'exec'
    echo "$decompress $process"
    # exit with JABORT if this fails
    exit 2


Copy this to the /tmp/majik file, and give it 0755 (executable) permissions. Here is an example of the output of the script:

    h4: {198} % /tmp/majik <test.ps.gz
    TYPE standard_input:_gzip_compressed_data,_deflated...
    COMPRESSED TYPE standard_input:_postscript_document_level_3.0
    gunzip -c | /usr/local/bin/gs -dSAFER -dNOPAUSE -q -sDEVICE=djet500 \
       -sOutputFile=/dev/fd/3 - 3>&1 1>&2
    h4: {199} % /tmp/majik </tmp/hi
    TYPE standard_input:_ascii_text
     /usr/local/bin/a2ps -q -B -1 -M Letter --borders=no -o- \
      | /usr/local/bin/gs -dSAFER -dNOPAUSE -q -sDEVICE=djet500 \
       -sOutputFile=/dev/fd/3 - 3>&1 1>&2


The first part of the script sets up a standard set of commands that we will use in the various conversions. A full blown package for conversion would use a database or setup file to get these values. We then use the file utility to determine the input file type. The output of the file utility is translated to lower case and multiple blanks and tabs are removed.

We use a simple shell case statement to determine if we have a compressed file and get a decompression program to use. We reapply the file utility to the decompressed file (if it was compressed) and get the file type.

Finally we use another case statement to get the output converter and then we run the command. For tutorial purposes, we use an echo rather than an exec so we can see the actual command, rather than the output.

Just for completeness, here is majikperl:

    #!/usr/bin/perl
    eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
        if $running_under_some_shell;
                # this emulates #! processing on NIH machines.
                # (remove #! line above if indigestible)
    my($gs) = "/usr/local/bin/gs -dSAFER -dNOPAUSE -q -sDEVICE=djet500 \
        -sOutputFile=/dev/fd/3 - 3>&1 1>&2";
    my($a2ps)="/usr/local/bin/a2ps -q -B -1 -M Letter --borders=no -o-";
    
    my($decompress,$compressed,$process,$type);
    $decompress=$compressed=$process=$type="";
    
    # get the file type
    $type = ` file - `;
    $type =~ tr /A-Z/a-z/;
    $type =~ s/\s+/_/g;
    print STDERR "TYPE $type\n";
    ($decompress,$compressed) = ("gunzip -c |", "gzipped")
      if( $type =~ /gzip_compressed/ );
    print STDERR "decompress $decompress\n";
    unless( seek STDIN, 0, 0 ){
        print "seek STDIN failed - $!\n"; exit 2; }
    if( $decompress ne "" ){
        $type = ` $decompress file - `;
        $type =~ tr /A-Z/a-z/;
        $type =~ s/\s+/_/g;
        print STDERR "COMPRESSED TYPE $type\n";
        unless( seek STDIN, 0, 0 ){
            print "seek STDIN failed - $!\n"; exit 2; }
    }
    $_ = $type;
    if( /postscript/ ){
        $process="$gs";
    } elsif( /_text_/ ){
        $process="$a2ps | $gs" ;;
    } else {
        print STDERR "Cannot print $compressed '$type'" >&2;
        # JREMOVE
        exit 3;
    }
    exec "$decompress $process";
    print "exec failed - $!\n";
    exit 2;


4.10.2. The ifhp Filter

The ifhp Print Filter is the companion print filter supplied with LPRng and is normally installed together with the LPRng software. Ifhp supports a wide range of PostScript, PCL, text, and raster printers, and can be configured to support almost any type of printer with a stream based interface. It provides diagnostic and error information as well as accounting information. It recognizes a wide range of file types by using the file utility and the pattern matching technique demonstrated in the previous section, and can do selective conversions from one format to others.

The PostScript and PCL printer job languages are supported by most printer manufacturers. However, in order to have a job printed correctly the following steps must be taken.

  1. The printer must be put into a known initial state by sending it the appropriate reset strings or performing a correct set of IO operations.

  2. If accounting is being done, then the printer accounting information must be obtained and recorded. See Accounting for more information about LPRng support for accounting.

  3. The file to be printed must be checked to see if it is compatible with the printer, and if not, a format conversion program invoked to convert it to the required format.

  4. If the user selects a set of printer specific options such as landscape mode, duplex printing, multiple copies, or special paper, the appropriate commands must be sent to the printer to select these options.

  5. The file must be transferred to the printer and the printer is monitored for any error conditions.

  6. Any required end of job commands are sent to the printer, and the printer monitored for error conditions while the job finishes printing.

  7. If accounting is being done, the printer accounting information such as page count and time used must be obtained and recorded. See Accounting for more information about LPRng support for accounting.



The ifhp filter uses the ifhp.conf configuration file to determine the actions and commands appropriate for various models of printers. See the ifhp documentation for details about the format and contents of this file. This file contains entries for a large number of PostScript, PJL, and other printers. The default printer used by ifhp is the HP LaserJet 4M Plus which supports PostScript, PCL, and PJL. The commands and formats used by this printer is compatible with a large number of other HP printers.

We will demonstrate how to add the ifhp filter to your printcap entry. Find the path to the ifhp filter using the find command as we did in the previous exercise. Modify the printcap as shown below and use lpc lpd to restart lpd.

    lp:sd=/var/spool/lpd/%P
      :force_localhost
      :lp=/tmp/lp
      :ifhp=model=default
      # modify the path to ifhp appropriately
      :filter=/usr/local/libexec/filters/ifhp


Now print the /tmp/hi and then display /tmp/lp using a text editor such as vi or emacs that shows control characters:

    h4: {200} % cp /dev/null /tmp/lp
    h4: {201} % lpr /tmp/hi
    h4: {202} % vi /tmp/lp
    ^[%-12345X@PJL
    @PJL JOB NAME = "PID 405" DISPLAY = "papowell"
    @PJL RDYMSG DISPLAY = "papowell"
    @PJL USTATUSOFF
    @PJL USTATUS JOB = ON
    @PJL USTATUS DEVICE = ON
    @PJL USTATUS PAGE = ON
    @PJL USTATUS TIMED = 10
    @PJL ENTER LANGUAGE = PCL
    ^]E^]&^]&k2G^]&s0C^]&l0O^]9^](s0P^](s10.00H^](s4099Thi
    ^]E^]%-12345X@PJL
    @PJL RDYMSG DISPLAY = "papowell"
    @PJL EOJ NAME = "PID 405"
    @PJL USTATUSOFF
    @PJL USTATUS JOB = ON
    @PJL USTATUS DEVICE = ON
    @PJL USTATUS PAGE = ON
    @PJL USTATUS TIMED = 10
    @PJL RDYMSG DISPLAY = ""
    ^[%-12345X


The output now contains all of the control sequences and setup codes needed to print a text file on the default printer. The :ifhp=model=default printcap entry is used by ifhp to get the information it needs to perform its operation. The following options are commonly provided in the :ifhp= option to configure the ifhp filter.

Table 4-3. :ifhp= Options

Option Purpose
model=name Use name entry in ifhp.conf
status or status@ Printer does or does not provide status information
sync, sync@, sync=(ps|pjl) Printer does or does not indicate ready to operate at start of job, or use PostScript or PJL code sequence to determine if printer is ready.
pagecount, pagecount@, pagecount=(ps|pjl) Printer does or does not have pagecount support, or use PostScript or PJL code sequence to determine pagecount.
waitend, waitend@, waitend=(ps|pjl) Wait or do not wait for end of job, or send PostScript or PJL code sequence to have printer report end of job.

The model=name entry is used to specify the configuration entry in the ifhp.conf file to be used by ifhp. This entry usually has all of the specific information needed by the ifhp filter.

The status option is the most common option usually provided in a printcap entry. This option is needed when the communication with the printer is write-only and no status information will be returned. If a printer normally supports returning status information then the ifhp.conf configuration entry will indicate this and the ifhp filter will try to get status. When no status is returned it will either terminate operation after a timeout or sit in an endless loop waiting for status. By specifying status@ you will suppress getting status. This also has the effect of doing sync@, pagecount@, and waitend@

The sync option is used to cause ifhp to wait for an end of job indication from the printer before starting the next job. This is usually done in order to make sure that all jobs have been flushed from a printer before starting another job. If you specify sync@ then you may get slightly faster startup but at the expense of losing the ends of previous print jobs.

The pagecount option is used to cause ifhp to get the value of a hardware pagecounter from the printer. If your printer supports such an item then the ifhp.conf configuration option usually indicates this. However, it takes a small amount of time to get the pagecounter information from the printer and you may not need it. Use sync@ if you do not want page counts.

Finally, waitend option is used to cause ifhp to wait for an end of job indication from the printer before exiting. If you specify waitend@ then the filter will exit immediately after sending the job, but you will possibly lose any error information or status reports from the printer.

For a complete list of all of the ifhp options please see the IFHP documentation.

4.10.3. The Jaggies - LF to CR-LF Conversion With lpf

When printing to vintage hard copy devices or to printers that support a text mode, many UNIX users discover that their output suffers from a case of the jaggies.

    Input file:
    
      This is
      a nice day
    
    Output:
    
      This is
             a nice day


UNIX systems terminate lines with a single NL (new line) character. This causes the printer to move down one line on the printing page but does not change its horizontal position and print the next character at the left margin. This is done by using the CR (carriage return) character. You need to convert the single NL to a CR-LF combination and the lpf filter supplied with LPRng does this.

First, locate the lpf filter. You can find it by using the command:

    h9: {160} % find /usr/ -type f -name lpf -print
    /usr/libexec/lpr/lpf


We will first see what the output is like without lpf, and then see what it does. Modify the lp printcap entry as shown below and then use lpc restart to restart the lpd server.

    lp:sd=/var/spool/lpd/%P
      :force_localhost
      :lp=/tmp/lp


Print a file and view the output using the following commands. If you do not have the od (octal dump) program, try using hexdump or some other appropriate program that displays the numerical contents of the file.

    h4: {203} % cp /dev/null /tmp/lp
    h4: {204} % lpr /tmp/hi
    h4: {205} % od -bc /tmp/lp
    0000000  150 151 012
               h   i  \n
    0000003


Now we will use the lpf filter. Modify the printcap as shown below and use lpc reread to cause lpd to reread the configuration information.

    lp:sd=/var/spool/lpd/%P
      :force_localhost
      :lp=/tmp/lp
      # modify the path to lpf appropriately
      :filter=/usr/local/libexec/filters/lpf


Now reprint the file:

    h4: {206} % cp /dev/null /tmp/lp
    h4: {207} % lpr /tmp/hi
    h4: {208} % od -bc /tmp/lp
    od -bc /tmp/lp
    0000000  150 151 015 012
               h   i  \r  \n
    0000004


As you see, lpf changes the LF to a CR-LF sequence.

4.10.4. Store and Forward Spool Queues

Up to now we have assumed that associated with each spool queue is a hardware printing device. When a job is sent to the spool queue the lpd server will take actions to filter it and then send it to the printing device.

However, we can also have store and forward spool queues. These queue act to simply buffer jobs and then forward them to another spooler. The following printcap entry shows how you can specify a store and forward queue.

    # store and forward using classical BSD :rm:rp
    lp:rp=pr:rm=host
      :sd=/var/spool/lpd/%P
      :server
    # store and forward using LPRng lp=pr@host
    lp:lp=pr@host
      :sd=/var/spool/lpd/%P
      :server


The legacy :rp (remote printer) and :rm (remote host) format can be used to specify the print queue and destination host for jobs sent to this queue. The LPRng :lp=pr@host format serves the same function, and has precedence over the :rm:rp form.

Edit the printcap file so it has contents indicated below, use checkpc -f to check the printcap, and then use lpc reread to restart the lpd server.

    lp:force_localhost
    lp:server
      :sd=/var/spool/lpd/%P
      :lp=lp2@localhost
    lp2:force_localhost
    lp2:server
      :sd=/var/spool/lpd/%P
      :lp=/tmp/lp2
Execute the following commands to print the /tmp/hi file and observe the results:
    h4: {209} % lpr /tmp/hi
    h4: {210} % lpq -lll
    Printer: lp@h4 (dest lp2@localhost)
     Queue: no printable jobs in queue
     Status: sending control file 'cfA029h4.private' \
        to lp2@localhost at 09:39:57.719
     Status: completed sending 'cfA029h4.private' \
        to lp2@localhost at 09:39:57.724
     Status: sending data file 'dfA029h4.private' \
        to lp2@localhost at 09:39:57.727
     Status: completed sending 'dfA029h4.private' \
        to lp2@localhost at 09:39:57.925
     Status: done job 'papowell@h4+29' transfer \
        to lp2@localhost at 09:39:57.926
     Status: subserver pid 29031 exit status 'JSUCC' at 09:39:57.953
     Status: lp@h4.private: job 'papowell@h4+29' printed at 09:39:57.961
     Status: job 'papowell@h4+29' removed at 09:39:57.993
    Printer: lp2@h4
     Queue: no printable jobs in queue
     Status: no banner at 09:39:58.054
     Status: printing data file 'dfA029h4.private', size 3 at 09:39:58.054
     Status: printing done 'papowell@h4+29' at 09:39:58.054
     Status: accounting at end at 09:39:58.054
     Status: finished 'papowell@h4+29', status 'JSUCC' at 09:39:58.054
     Status: subserver pid 29033 exit status 'JSUCC' at 09:39:58.056
     Status: lp2@h4.private: job 'papowell@h4+29' printed at 09:39:58.056
     Status: job 'papowell@h4+29' removed at 09:39:58.069


As we see from the status, our job was sent to the lp spool queue first. It was store there and then the lpd server transferred it to the lp2 spool queue, where it was printed to the file /tmp/lp2.

4.10.5. Filtering Job Files In Transit

One of the major problems with store and forward operation is that the destination spool queue may not actually be a spool queue - it can be a printer. Many network printers provide an RFC1179 compatible network interface and act, for job forwarding purposes, like a host running a limited capability BSD print spooler.

By adding a filter to the printcap information we can modify the format of a job file so that it is compatible with the destination printer.

Edit the printcap and /tmp/testf files so they have the contents indicated below, give /tmp/testf executable permissions, use checkpc -f to check the printcap, and then use lpc reread to restart the lpd server.

    # set /tmp/testf to contain the following
    #   and chmod 755 /tmp/testf
    #!/bin/sh
    echo TESTF $0 $@
    /bin/cat
    exit 0
    
    # printcap
    lp:force_localhost
    lp:server
      :sd=/var/spool/lpd/%P
      :lp=lp2@localhost
      :filter=/tmp/testf
      :bq_format=ffl
    lp2:force_localhost
    lp2:server
      :sd=/var/spool/lpd/%P
      :lp=/tmp/lp2
Execute the following commands to print the /tmp/hi file and observe the results:
    h4: {211} % lpr /tmp/hi
    h4: {212} % lpq -llll
    h4: {213} % lpq -llll
    Printer: lp@h4 (dest lp2@localhost)
     Queue: no printable jobs in queue
     Status: no banner at 09:55:53.681
     Status: printing data file 'dfA086h4.private', size 3, \
        IF filter 'testf' at 09:55:53.683
     Status: IF filter finished at 09:55:53.713
     Status: printing done 'papowell@h4+86' at 09:55:53.714
     Status: sending job 'papowell@h4+86' to lp2@localhost at 09:55:53.734
     Status: connecting to 'localhost', attempt 1 at 09:55:53.735
     Status: connected to 'localhost' at 09:55:53.739
     Status: requesting printer lp2@localhost at 09:55:53.740
     Status: sending control file 'cfA086h4.private' 
          to lp2@localhost at 09:55:53.752
     Status: completed sending 'cfA086h4.private' 
          to lp2@localhost at 09:55:53.757
     Status: sending data file 'dfA086h4.private' 
          to lp2@localhost at 09:55:53.758
     Status: completed sending 'dfA086h4.private' 
          to lp2@localhost at 09:55:53.939
     Status: done job 'papowell@h4+86' transfer 
          to lp2@localhost at 09:55:53.940
     Status: subserver pid 29088 exit status 'JSUCC' at 09:55:53.980
     Status: lp@h4.private: job 'papowell@h4+86' printed at 09:55:53.983
     Status: job 'papowell@h4+86' removed at 09:55:53.998
    Printer: lp2@h4
     Queue: no printable jobs in queue
     Status: subserver pid 29092 starting at 09:55:54.005
     Status: accounting at start at 09:55:54.005
     Status: opening device '/tmp/lp2' at 09:55:54.005
     Status: printing job 'papowell@h4+86' at 09:55:54.005
     Status: no banner at 09:55:54.006
     Status: printing data file 'dfA086h4.private', size 298 at 09:55:54.006
     Status: printing done 'papowell@h4+86' at 09:55:54.006
     Status: accounting at end at 09:55:54.006
     Status: finished 'papowell@h4+86', status 'JSUCC' at 09:55:54.006
     Status: subserver pid 29092 exit status 'JSUCC' at 09:55:54.008
     Status: lp2@h4.private: job 'papowell@h4+86' printed at 09:55:54.008
     Status: job 'papowell@h4+86' removed at 09:55:54.020


We have displayed a bit more status information so that we can see what the actions the lp queue carries out. It first processes the job data file using the testf filter and puts the results in a temporary file. Then it sends the contents of the temporary file to the lp2 queue. The lp2 queue receives the converted job file and then prints it to the /tmp/lp2 file in turn.

By default, each file in a job is processed by a print file and the processed output is then sent to the destintion as individual job files, each with the format specified by the value of the bq_format (default f) option. The bq_format option has the format iOiO...d; each i is the original format and the corresponding O is the output format. If there is an odd number of characters then the last unmatched character is used as the default format, otherwise no translation is done. For example, flrfl will cause the f format to be mapped to l, r to f, and any others to l.