HIGH DYNAMIC RANGE IMAGE COMPRESSION ON COMMODITY HARDWARE FOR REAL-TIME MAPPING APPLICATIONS

: This paper describes a lossy compression scheme for high dynamic range graylevel and color imagery for data transmission purposes in real-time mapping scenarios. The five stages of the implemented non-standard transform coder are written in portable C++ code and do not require specialized hardware to run. Storage space occupied by the bitmaps is reduced via a color space change, 2D integer discrete cosine transform (DCT) approximation, coefficient quantization, two-size run-length encoding and dictionary matching hinged on the LZ4 algorithm. Quantization matrices to eliminate insignificant DCT coefficients are derived from a representative image set through genetic optimization. The underlying fitness function incorporates the obtained output size, classic image quality metrics and the unique color count. Together with a zone-based adaptation mechanism, this allows to specify target bitrates instead of percentage values or abstract quality factors for the reduction rate to be directly matched to the available communication channel capacities. Results on a camera control unit of a fixed-wing unmanned aircraft system built around entry-level PC hardware revealed single-thread compression and decompression throughputs of several hundred mebibytes per second for full-swing 16 and 32 bit RGB imagery at medium compression ratios. A degradation in image quality compared to popular compression libraries could be identified, however, at acceptable levels statistically and visually.


INTRODUCTION
With ongoing advances in camera sensor technology and nearsensor pixel postprocessing, optical imagery with radiometric resolutions of fourteen, sixteen and even more bits per sample has started to make its way into various remote sensing applications.When the advantages of high dynamic range (HDR) bitmaps, i.e., capturing details under varying light conditions, are to be harnessed for airborne real-time mapping, the problem of high data volumes to be transmitted over communication channels of limited capacity within a narrow time frame arises.This applies to both the on-board image projection onto an existing surface model where ready-to-use map tiles have to be sent and to ground-based projection setups where raw or preprocessed image frames must be communicated.
In order to reduce the amount of data to match the capabilities of the available downlink, the images frequently get tone-mapped prior to compression which inevitably results in substantially less color nuances.Depending on the degree of image degradation, the information loss may not be desirable or acceptable at all.This applies in particular when operational pictures for natural or man-made disasters such as earthquakes, avalanches or major fires must be prepared without delay so that local rescue teams promptly can be given instructions to save human lives or prevent substantial economic damage.Here, the dynamic range of the imagery to be mapped has to be preserved as much as possible in order to recognize relevant scene details of the unknown surroundings on-site and under changing or poor illumination conditions.Such conditions especially occur during prolonged acquisition periods or when the power grid infrastructure has been impacted.

RELATED WORK AND MOTIVATION
Currently, there does not seem to be many publications that specifically deal with HDR image compression in a remote sensing context.In (Belyaev et al., 2017), 16 bit graylevel infrared images are subdivided into two low dynamic range (LDR) rasters.The LDR bitmap pairs to be compressed into standard JPEG (Pennebaker and Mitchell, 1992) or PNG (ISO, 2004) streams are obtained from the least and most significant bytes of the pixels.To control the degree of data reduction, discrete bitrate-distortion curves are computed from a set of test images.These curves subsequently get translated into the abstract quality factors expected by both image encoders.The split technique is once more applied by (Mantel and Forchhammer, 2017) to infrared bitmaps using the JPEG-XT still image processor (ISO, 2020b).In addition, the authors evaluate JPEG 2000 (ISO, 2019) and the MPEG-H Part 2/High Efficiency Video Coding (HEVC) (ISO, 2020a) extension on still image compression that directly work with 16 bit samples.Both approaches are purely software-based.Although no runtime figures are given for encoding and decoding, a substantial overhead can be expected at least when LDR bitmap pairs are to be dealt with.
Regarding real-time compression, (Manthey, 2014) describes a CCSDS 122.0-B-2-compliant architecture (CCSDS, 2017) to reduce the data volume of 16 bit satellite imagery on-board.It is based on a three-level 2D discrete wavelet transform (DWT) similar to JPEG 2000 with subsequent coefficient rearrangement and bitplane encoding.The incremental algorithm builds on field-programmable gate array (FPGA) hardware to achieve encoding speeds of around 400 MiB/s independently of the image content.It operates in either lossless or near-lossless mode to preserve the original content albeit this restricts the achievable compression ratios.The work of (Melián et al., 2021) embraces real-time transmission of hyperspectral data cubes of 12 bits per sample packed into a 16 bit unsigned type.It utilizes a reduction algorithm named HyperLCA that was specifically tailored to this kind of imagery.HyperLCA comes along with a degradation in image quality (i.e., it is lossy) to achieve compression ratios of about 1:20.However, the encoder as a component of an unmanned aircraft system (UAS) relies on graphics processor hardware to handle the amount of roughly 87.5 MiB/s of raw data delivered by the camera module.This paper proposes a custom compression algorithm which has been designed for the data transmission phase in HDR real-time mapping scenarios.Unlike in the previous work cited above, the described lossy encoder-decoder combination (codec) supports unsigned graylevel and RGB multispectral input bitmaps with radiometric resolutions between 12 and 41 bits per sample.This is beyond the capabilities of many of the established JPEG and MPEG standards.To achieve adequate compression ratios, bitmap storage is reduced through a sequence comprising a color space change, frequency domain transformation, zonal quantization of the resulting coefficients, run-length encoding and dictionary matching to be reversed for decompression.Image quality and compression ratios are controlled via target bitrates.The bitrate goals are given as bits per pixel values to be directly adapted to the data rate of the downlink communication channel.The individual stages of the algorithm have been selected, implemented as a portable C++ library and tuned for speed so they could run on inexpensive commodity hardware instead of graphics processors or FPGAs.Yet, it will be demonstrated that both the compressor and decompressor achieve single-thread performances sufficient to process 50 megapixel 16 bit RGB images of roughly 300 MiB per frame to a fraction of their original memory footprint and vice versa in much less than a second on low-end PC-like hardware that is part of an unmanned aircraft system.

COMPRESSION ALGORITHM DESCRIPTION
The proposed compression algorithm obeys the transform coder structure that is also followed by the ubiquitous JPEG codec.Image encoding starts with a colorspace transformation to separate luminance from chrominance when available.Due to the biology of the human visual system, color information can be condensed much stronger without noticeable artifacts.The luminance and chrominance channels afterwards get transformed into the frequency domain to decorrelate slow and fast intensity changes.Insignificant frequency components are set to zero by subsequent quantization.The quantization stage is followed by two lossless compression steps to reversibly condense the redundancies present in the stream of coefficients.While the twosize run-length encoder (RLE) shrinks consecutive sequences of the same symbol, a dictionary-based algorithm abbreviates recurring RLE tuples.For image decompression, the process chain needs to be run in reverse order performing the inverse operations.

Colorspace transformation
For the initial separation of the luminance, or luma, and chrominance, or chroma, information of the image, the input bitmaps undergo a transformation from the original RGB colorspace to a YUV representation (ITU, 2007).Each pair of multispectral pixels RiGiBi, i ∈ {1, 2} is processed according to equation 1 where c denotes the intensity range center (e.g., c = 32768 for 16 bits per sample) and >> is the arithmetic shift right operator.For graylevel bitmaps, the luma values Y are obtained by subtracting c from each pixel value.
Each output component gets written to a separate image plane.The resolution of the chroma components U and V is reduced by two in both image directions using averaging horizontally and the nearest neighbor filter vertically to at least partially suppress aliasing artifacts.Hence, the U and V samples are shared by a 2 x 2 subgrid of luma values which sometimes is called 4:2:0 subsampling in video coding.In the C++ library implementation, for input data that are band-interleaved by pixel (BIP), image traversal uses pointer arithmetic to increment the current position inside the bitmap and avoid full address recalculations.Pairwise RGB tuple processing eliminates expensive branching to differentiate between even and odd columns and likely will improve processor register utilization.Using bit shifts instead of divisions by the respective powers of two generally is faster and safe on unsigned sample data types.Also, the original intensity range will be translated but not left except for the calculation of intermediates.The Y, U and V outputs are signed integers of the same size as the input samples.
Image decoding employs the reverse transformation from equation 2 which operates on pairs of luminance samples to gain speed.For chroma upsampling, the U and V values remain constant for each 2 x 2 subgrid of Y values performing nearest neighbor interpolation.The R, G and B reconstructions must be clamped to the original image intensity range since over-or underflows may happen due to compression artifacts from other codec stages.For graylevel bitmaps, only the center c needs to be added to each Y value and fenced in to recover the unsigned intensity. (2)

Frequency domain transformation
The Y, U and V image planes from the previous stage are individually decomposed into their frequency components by running the 2D discrete cosine transform (DCT) on 8 x 8 pixel blocks.The 2D DCT itself builds on a fast integer approximation of the floating-point 8-point 1D Loeffler DCT with 11 multiplications and 29 additions which is theoretically optimal (Loeffler et al., 1989).Figure 1 shows this DCT and its inverse, the IDCT, as signal flow graphs to be executed from left to right.
In the approximation of the transform, its original rotator blocks c1, c3 and c6 in their optimized form with three multiplica  DCT and 1D IDCT with 11 multiplications and 29 additions the c3.6809 approximation, the round-off error of its factor-free approximation of -1/16 asymmetrically gets compensated in the IDCT by -3/32.Similarly, the square root calculation of stage 3 will be modeled by the skew numbers of 45/32 for the forward transform and 23/16 for its inverse.As final coefficients that are directly applied to the output samples 3 and 5, these values do not appear in the DCT nor IDCT code but are incorporated as nominator-denominator pairs into the quantizer (see section 3.3.1).The quantizer will also take care of the scaling factor of 0.125 or 1/8 that comes with the forward Loeffler transform as depicted and which needs to be applied to preserve the amplitude of the samples.
In total, the adapted 8-point DCT and IDCT require 4 and 5 multiplications respectively plus 29 additions and 6 shifts for each direction.The reverse operation thus is slightly slower, however, this disadvantage will be overcompensated later by the lossless stages of the codec.Table 1 compares all factors of the floating-point 1D building block and their integer approximation as used by the library code.
To obtain the 2D DCT, the 1D DCT is run vertically on blocks of 8 x 8 samples of the input image planes whose dimensions must be aligned accordingly.The outcome is written to a small buffer of 64 intermediates on which the DCT gets executed horizontally.The final result is passed to the quantizer that writes the altered frequency coefficients into dedicated output image planes.For the 2D IDCT, the process is gone through in reverse order.Using a small temporary buffer eliminates unnecessary stride calculations to skip the remainder of image lines, keeps processor registers available to the DCT itself and is data cache friendly.Therefore, runtime will be optimized at the insignifi- Regarding the accuracy of the integer DCT, deviations to the original version will occur in both the forward and inverse direction due to the approximation and rounding.However, a quick test with roundtrips (DCT followed by IDCT) taken on 500 random 8 x 8 matrices made of signed 16 bit elements revealed that the mean absolute error caused by the frequency transform on average is around 0.3% of the number range.This value is considered acceptable for compression purposes.

Quantization
The obtained frequency components undergo quantization to reduce their amplitude to predefined discrete levels.This codec stage largely controls the degree of compression and hence is responsible for the bulk of image information loss to achieve practical data reduction ratios.

Forward and inverse quantization process
Quantization itself involves integer divisions by positive numbers so that negligibly small DCT values become zero.For the inverse, or dequantization, the quotients are multiplied by the same numbers to roughly reconstruct the original frequencies.Depending on the rounding mode, the quantization error can be reduced trading accuracy for speed.Plain divisions that truncate towards zero are preconfigured in the library implementation.A slower but more precise rounding technique towards the closest integer which inevitably involves evaluating a condition is also available (figure 2).For the 8 x 8 2D DCT blocks, quantization and dequantization are conducted using matrices of the same size.The matrices contain the divisors and factors respectively to be applied to the individual frequency components during compression and decompression.However, before being used, some of the raw matrix entries must be adjusted by the rational representations of √ 2 inherited from the DCT, and the DCT scaling of 1/8 needs to be incorporated.For the 2D DCT approximation, the original quantization matrices Q i therefore have to be multiplied and divided element-wise by the 8 x 8 matrix products as shown by equation 3.This yields premultiplied ready-to-use versions Q ′ i for the forward step and D ′ i for dequantization.
q mul = 1 1 1 45 1 45 1 1 q div = 8 8 8 8•32 8 8•32 8 8 Integrating the root and scale terms into the matrices Q ′ i and D ′ i limits the maximum dynamic range allowed for the image codec to log 2 [2 63 /(8 2 • (8 • 32) 2 )] = 41 bits per sample with 64 bit wide signed integers.This calculation is based on the maximum 1D DCT coefficient blow-up of eight, its scale of eight and the square root denominator of 32 to be applied as a factor to the quantization matrix.The premultiplied Q ′ i and D ′ i also impose a lower bound on the radiometric resolution of imagery to be compressed and decompressed.This issue is caused by significant round-off errors on small raw matrix entries and a coarse discretization on large raw matrix elements to be further scaled.Hence, in practice, the codec is not optimized for bitmaps with less than ∼12 bits per sample.This limitation however could be alleviated when the square root term is migrated back into the DCT at the cost of two more multiplications.

Base quantization matrices for bitrate control
To be able to directly match the communication channel capacity, the missing quantization matrices Q i are to be generated in such a way that the compression level can be expressed as a bitrate, i.e., in bits per image pixel (bpp).This makes them dependent on the entire compression algorithm, and hence no templates to be reused exist.The Q i are therefore obtained through genetic optimization (Mitchell, 1998) on a set of representative training images.Optimization is performed for each bit depth to be supported by the codec expressed as bits per sample, or bits per component (bpc), which can be easily upscaled to the bpp value by the color channel count.Also, quantization matrices are refined separately for the luma and chroma image planes yielding matrix pairs (Q i,l , Q i,c ).This reflects the greater influence of the brightness channel on visual image quality.
Like in biology, genetic optimization takes an initial population generated from random DNA strands and evaluates their adaptation to the environment with a fitness function.The fittest individuals reproduce and give birth to a new generation of population members by exchanging parts of their DNA while "weak" individuals get discarded.Diversity among the population members is further increased with point mutations, i.e., local random changes to their DNA.The evolution cycle repeats until a maximum generation count has been reached.The fittest individual of the population is eventually chosen to produce the final quantization matrix pair.
For the HDR image codec, the DNA independently defining the shape of the luma and chroma parts of the quantization matrix pairs is chosen as the arguments a and b of equation 4. Discrete sampling produces the 2 • 64 elements of (Q i,l , Q i,c ).The created phenotypes loosely resemble the quantization matrix structure from JPEG for arbitrary bit depths and expose smooth transitions between the matrix entries.
The fitness function f to determine how well a quantization matrix pair suits a particular compression task is modeled as the square root of four arguments with empirically determined exponential weights (equation 5).Among the arguments is the ratio q b of the currently achieved bitrate ba to the target bitrate bt which falls back to zero if the latter is exceeded by the former.Also, the universal image quality (UIQ) measure (Wang and Bovik, 2002) scaled to the [0, 1] interval, peak signal-to-noise ratio (PSNR) (Salomon, 2006) and unique color count (UCC) are incorporated into f .The rationale for this choice is to obtain the best possible image quality (in terms of UIQ and PSNR) while approaching the given compression ratio (in terms of the bitrate quotient) as closely as possible from below.Also, the variety of colors (in terms of the UCC) shall be maximized and not get sacrificed to an excessively dominant luma channel. (5) To evaluate the fitness function, the set of training images takes compression and decompression roundtrips for each quantization matrix pair of the population.The decompression outputs are compared to the respective originals in graylevel or RGB space.For each bitmap pair, PSNR is computed on a per-sample basis, and UIQ is obtained for each color band to be subsequently averaged.UCC calculation involves ordering the pixels of the decompression result by color in-place with the quicksort algorithm and counting the color transitions.This universal approach works for any sample type, band count and bit depth in linearithmic time.
The final fitness for a quantizer configuration (Q i,l , Q i,c ) for RGB images and a particular target bitrate to be met is derived from the average of the fitness function arguments over all training data.When the genetic optimization is run for a sufficiently dense set of bt, a lookup table from the achieved bitrates ba (which ultimately will approach bt) to the corresponding quantization matrix pairs can be constructed.For graylevel bitmaps, the optimization results from RGB imagery get recycled, and mappings from the partial bitrates of the luma channel to the luma quantization matrices Q i,l are stored.Encoding images of the bit depth used during optimization to an almost arbitrary target bitrate is achieved by linear interpolation or extrapolation based on the two pairs of matrices that belong to the two neighboring bitrates according to the lookup table.6 displays sample raw luma and chroma matrices (Q i,l , Q i,c ) for 16 bpc imagery for the target bitrate bt = 2 bpp which equals a compression ratio of 1:24 for RGB data.The actually achieved Y, U and V partial bitrates from the genetic optimizer for this target are ba = 1.8615 + 0.0753 + 0.0626 = 1.9994 bpp.This corresponds to image compression ratios of 24.0072 (RGB) and 8.5952 (graylevel).
3.3.3Zone-adaptive quantization While the quantization matrices from genetic optimization will provide a reasonable starting point when a certain target bitrate needs to be met, the actually achieved compression ratios for individual images still may vary.Therefore, to closely approach the target, the matrices used to reduce the amplitude of the DCT components may get dynamically adjusted during the reduction process.
For this zone-adaptive quantization mechanism, equally sized strips of image scanlines are compressed individually.The bitrate actually achieved for the first strip is tracked, scaled to the full image dimensions and compared against the target bitrate.
If the goal is exceeded, the bitrates to be used for the next image strips get gradually reduced based on the previous setting within a predefined corridor.On a shortfall, they will be increased recursively.The default adjustment factors are 0.8 and 1.24 for the bitrate decrease and increase up to the minimum and maximum of 0.5 and 4 respectively.In extreme cases, the compression ratio therefore may locally breathe between one quarter to two times the reduction rate aimed at.Due to the asymmetric adjustment factors (the reciprocal of 0.8 equals 1.25), the achieved average bitrate for the entire bitmap will have the tendency to stay slightly below the target bitrate.This behavior is desirable for data transmission with upper bandwidth limits.

Run-length encoding
Depending on the matrices used, the set of DCT coefficients likely will contain sequences of identical values after quantization.In fact, for realistic compression settings, the majority of frequency components form extended runs of zeros (see figure 3) which can be represented compactly through run-length encoding (RLE) without any information loss.
The run-length encoder of the C++ compression library by default operates on the concatenated stream of quantized DCT samples of all image planes.In contrast to JPEG where RLE is implemented on individual 2D DCT blocks with a zigzagstyle reordering scheme, this approach potentially allows larger The value part can either be "short" or "long" to conserve memory on small quantized DCT coefficients.For images with 16 bits per sample, this translates into signed byte and signed short integer data types constituting a type demotion.Under-or overflowing frequency components will be truncated in the rare case the "long" data type is insufficient.Discrimination between the two value sizes happens via the most significant bit of the skip part of the tuple.For 16 bit frames, the skip type is an unsigned byte, and hence up to 127 zeros can be fold at once.In the worst case, without any repeating DCT coefficients in the input stream like on near-lossless quantizer configurations, the RLE output will expand and not contract.This scenario currently is not specifically addressed by the HDR image codec.

Dictionary-based compression
The remaining redundancies in the tuple stream from the runlength encoder are further condensed with a second lossless compressor.Because of its speed, the LZ4 algorithm (Collet, 2016) (Collet, 2019) was chosen which is also integrated into the Linux kernel (Schmidt, 2017).LZ4 is a variant of the LZ77 dictionary-based encoders (Ziv and Lempel, 1977) that replace repeating sequences of (not necessarily identical) symbols with a single reference to their previous occurrence.To find the matches, an implicit dictionary formed by a sliding window over the most recent data gets examined.
For the proposed image codec, the high compression mode functions of the LZ4 open source reference library are invoked with a fixed reduction level of two.This setting offered a reasonable compromise between runtime and data reduction in combination with the other codec stages during a quick test.Nevertheless, while LZ4 in practice performs much faster than arithmetic or Huffman coding as used for JPEG, its compression ratio generally will stay behind these entropy-based algorithms.

PERFORMANCE ANALYSIS
As a proof-of-concept, the HDR transform coder was implemented as a C++ library whose functions are called by an accompanying demo application and the tool for the genetic optimization of the quantization matrices.The library does not utilize machine-specific instructions and hence can be compiled on different target platforms.Its code further is single-threaded.Parallel execution for the designated application of real-time mapping will be accomplished by running multiple codec instances concurrently on the images captured.To save memory for the intermediate bitmaps created by the HDR codec stages, each supported bit depth is associated a set of specific data types aggregated a trait class.The algorithms involved in image encoding and decoding get instantiated with the respective trait at translation time.This enables type-specific compiler optimizations and boosts speed at the price of object code duplication.
To evaluate the achievable image quality and runtime for both compression and decompression, the library initially got configured with luma and chroma quantization matrices from genetic optimization.The dedicated optimizer tool was run on a diverse set of 16 RGB pinhole and mapped ortho images of 16 and 32 bits per component yielding 48 and 96 bits per pixel respectively.Except for computer-generated renderings, the data was originally captured by 14 bit sensors whose dynamic range was subsequently scaled with a non-linear transfer function to emulate true HDR hardware.Multithreaded calculation of the quantization matrices for the test bitmaps was conducted with a population size of 512 individuals and 128 generations.It took around three days on a fully utilized high-performance server with 56 physical processor cores and simultaneous multithreading (SMT) enabled even though downsampling to between 11 and 16 megapixels was applied to the input images in advance.
Optimizer data got recalled for performance analysis in full resolution ranging from 11 to 67 megapixels.Compression and decompression were carried out on the control and image processing unit of the DLR MACS-nano aerial camera (Kraft et al., 2023).MACS-nano fits into the nose section of a Quantum Systems fixed-wing UAS and has been recently involved in real-time mapping missions on disaster sites (figure 4).The integrated computer builds on entry-level PC hardware with an Intel i3-1115G4E embedded processor that nominally operates at 3.0 GHz and 8 GiB of automotive LPDDR4 memory clocked at 1833 MHz.It runs the Linux operating system.because less data has to be handled by LZ4.The asymmetric runtime characteristics particularly of the lossless stages make decompression even faster.Decoding throughput always exceeds 300 MS/s and peaks at 486 MS/s or nearly 1 GiB/s of data.In both disciplines, the reference JPEG implementation is outperformed by almost a factor of two.The popular JPEG 2000 library at best is roughly 60 times slower during compression and five times behind on decompression.For 32 bpc imagery, less megasamples are processed per second in absolute numbers.However, the larger sample type increases the byte count that can be compacted per unit of time by 46% to 75% for comparable bitrates.At reduction settings of 1:24 and 1:48, frame rates greater than 1.5 Hz and 1.8 Hz are reached respectively by the HDR encoder for the 50 megapixel bitmaps of both color depths.The resulting data streams can be transmitted over 4G networks considering their nominal data rates (Dahlman et al., 2013).In the special case of on-board-mapping, when overlapping image parts get clipped during the projection onto a surface model to remove redundancies in advance (Hein and Berger, 2018), the achievable frame rates will further increase.
A gain linear to the pixel count reduction could be expected.
Regarding image quality, the HDR codec introduces more artifacts than the established compression standards on the 16 bpc bitmaps.PSNRs for the urban Berlin scene are lower by about 5 dB compared to the 12 bit-enabled JPEG library for quality factors of q = 20 and q = 4 to achieve equal bitrates of bt = 2 and bt = 1 bpp.The delta to wavelet-based JPEG 2000 is roughly 8 dB.The unique color counts of the proposed approach and JPEG are almost on par.Despite the drop to approximately 39% in the worst case for the subset, the UCC remains well into the millions.Eight-bit tone mapping solutions that allow standard JPEG streams to be transmitted likely will reproduce much less color shades, e.g., 575046 for the Berlin photo when applying the GIMP open-source image processor.JPEG 2000 manages it to preserve and slightly extend the original chromaticity.UCC readings beyond 100% of the original value also appear with the HDR codec on high bitrates.These overshots can be attributed to inaccuracies in the frequency transforms and quantization errors.For all compressors, the UIQ values approach the theoretical maximum and reflect the PSNR behavior.Quality and compression ratio control based on the mapping from bitrates to quantization matrices from genetic optimization in combination with zone-adaptive image encoding seems to work as expected.The actually obtained bitrates mostly stay close to the targets of bt = 6, 2 and 1 bpp.Deviations concentrate on low reduction ratios for which fewer quantization matrix samples have been generated and stored inside the lookup table.Although the perfection of JPEG 2000 remains unmatched, the accuracy obtained with the HDR image codec will enable on-line adjustments of the compression degree to the capacity of the actual communication downlink independently of the upcoming image content to be transmitted.

CONCLUSION
This paper has outlined the design and implementation of a fast transform coder for HDR real-time mapping applications that runs on affordable off-the-shelf general purpose computing hardware.The codec covers a wide range of bit depths to which it can be specifically adapted using genetic optimization.The portable proof-of-concept library that encapsulates the underlying highly optimized subalgorithms is capable of compressing and decompressing 50 megapixel RGB bitmaps of 16 and even 32 bits per component in less than a second on a single CPU core.Runtime outperforms optimized reference implementations of the JPEG and JPEG 2000 standards at the cost of an acceptable level of image quality degradation.Depending on the image size and intended frame rate, the achieved compression ratios will enable real-time transmission of the compressed data streams over decent radio downlinks and cellular networks.
tions are replaced by c0, c[16 arccos(0.75)/π]≈ c3.6809 and c[16 arccos(1/(2 √ 2))/π] ≈ c6.1596, i.e., we rotate a little bit less and further.The involved factors a therefore yield 1, 0.75 and 0.5 respectively which eliminates two of them adding two arithmetic right shifts instead.The dependent constants (b − a) and −(a+b) subsequently are altered into close fractional powers of two to reflect the change.For the tiny (b − a) value of

Figure 1 .
Figure 1.Signal flow graphs for the original 8-point Loeffler 1D DCT and 1D IDCT with 11 multiplications and 29 additions

Figure 3 .
Figure 3. Sample RLE input (a) luma channel of car scene, (b) quantized DCT coefficients, homogeneous gray indicates zerossequences of zeros to be reduced.Piecewise image decoding nevertheless remains possible using zone-adaptive quantization.The produced RLE output consists of (skip;value) tuples which consume just a few bytes each.The first part stores how many consecutive zeros precede a specific non-zero value.The value part can either be "short" or "long" to conserve memory on small quantized DCT coefficients.For images with 16 bits per sample, this translates into signed byte and signed short integer data types constituting a type demotion.Under-or overflowing frequency components will be truncated in the rare case the "long" data type is insufficient.Discrimination between the two value sizes happens via the most significant bit of the skip part of the tuple.For 16 bit frames, the skip type is an unsigned byte, and hence up to 127 zeros can be fold at once.In the worst case, without any repeating DCT coefficients in the input stream like on near-lossless quantizer configurations, the RLE output will expand and not contract.This scenario currently is not specifically addressed by the HDR image codec.

Figure 4 .Figure 5 .
Figure 4. (a) MACS-nano in UAS nose, (b) Real-time mapping for damage assessment (Ahr valley flash flood, Germany, 2021)Table2lists selected timings and bitrates plus the PSNR, UIQ and UCC quality readings as introduced in section 3.3.2for a subset of four images of natural, urban and synthetic environments (figure5).Measurements were also taken for the JPEG and JPEG 2000 codecs using release versions of the IJG libjpeg 9e (IJG, 2022) and OpenJPEG 2.5.0 (OpenJPEG Development Team, 2022) software libraries.Libjpeg was specifically compiled to handle raster data with 12 bits per sample.The dynamic range of the 16 bpc input imagery had to be linearly reduced after loading, and the fastest DCT option with Huffman entropy coding got chosen to ensure a fair comparison.Native bitmaps with 16 bits per component could be tested only for JPEG 2000 since the used accompanying command-line tools refused to read in 32 bpc bitmaps.The OpenJPEG utilities were invoked with the default compression and decompression settings.For the 16 bpc bitmaps, the compression throughput of the proposed HDR image codec excluding I/O grows with the reduction rate.It increases from roughly 140 to nearly 400 megasamples per second (MS/s), i.e., 262 MiB/s to 750 MiB/s, mostly

Table 1 .
Original and integer frequency transform parameters to four decimals; for the latter, the exact (e) and power-of-two values for the forward (f) and inverse (i) operation are given cant cost of a few hundred bytes of additional storage.The DCT code requires integer type promotion since the computed scaled coefficients very likely will exceed the signed input data type from color conversion except for trivial image content.Type transition happens once during the vertical pass.