The Digital Image Processing package is a new Mathematica 4 application package. Designed for speed and functionality, the package features a comprehensive collection of over one hundred one- and two-dimensional linear and non-linear image operators for common measurement, analysis and image enhancement tasks. Here we give a quick tour, with references to the full documentation in the User's Guide, of selected Image Processing functions and operators. All the examples are taken from the User's Guide.
This loads the package.
![[Graphics:images/index_gr_1.gif]](../quicktour/images/index_gr_1.gif)
A digital image is a two-dimensional (2-D) discrete signal. Mathematically, such signals can be represented as functions of two independent variables, for example, a brightness function of two spatial variables. A monochrome digital image
is a 2-D array of luminance values,
with
, and typically
. Each element of the array is called a pel (picture element), or more commonly pixel. Typical image dimensions are
and
.
Consider a small portion of the "head" image, one of the monochrome example images. We read the image and extract the raw pixel data. The example images included in the ImageProcessing package are located in the Data directory of the root ImageProcessing directory. The location on any system, is given by the system variable $ImageDataDirectory. In a typical Windows installation the path may be as follows.
![[Graphics:images/index_gr_8.gif]](../quicktour/images/index_gr_8.gif)
The directory paths listed in this system variable are automatically searched for named files in an ImageRead operation.
![[Graphics:images/index_gr_10.gif]](../quicktour/images/index_gr_10.gif)
![[Graphics:images/index_gr_11.gif]](../quicktour/images/index_gr_11.gif)
Small values represent dark areas of an image, while large values represent bright pixels.
A color digital image is typically represented by a triplet of values, one for each of the color channels, as in the frequently used RGB color scheme. The individual color values are almost universally 8-bit values, resulting in a total of 3 bytes (or 24-bits) per pixel. This yields a three-fold increase in the storage requirements for color versus monochrome images. Naturally, there are a number of alternative methods of storing the image data. Most widely used are the so-called pixel-interleaved (or meshed) and color interleaved (or planar) formats. Less frequent, but possible are row-wise or column-wise interleaving methods. In a pixel-interleaved format every image pixel is represented by a list of three values:
,
whereas in the color-interleaved format, the color information is separated into three matrices, one for each of the three color channels:
.
Here we read the color example image - "beans".
![[Graphics:images/index_gr_15.gif]](../quicktour/images/index_gr_15.gif)
Here we extract a 4x4 region and display it in a meshed format.
![[Graphics:images/index_gr_16.gif]](../quicktour/images/index_gr_16.gif)
Here, the same region of the color image is displayed in a planar format.
![[Graphics:images/index_gr_18.gif]](../quicktour/images/index_gr_18.gif)
The RGB color scheme is just one of many color representation methods used in practice. The letters R, G, B stand for red, green and blue, the three primary colors used to synthesize any one of
or approximately 16 million colors. Equal quantities of the three color values result in shades of gray in the range {0, 255}. Other supported color models, include monochrome, HSV which stands for hue, saturation, value and CMYK which stands for cyan, magenta, yellow. The latter has found application primarily in the printing and graphics markets. HSV is useful in color image processing, since it separates the color information from brigthness.
Getting Started: How to find the example image?
User's Guide: Sections 2.1, 2.2, 2.3.
Function Index: ImageRead, ImageTake, PlanarImageData, RawImageData.
The color image data structure is encapsulated in an ImageData Mathematica expression. ImageData is a graphics primitive that facilitates the easy manipulation and handling of single and multi-channel images. This reads in a small example image in RGB color format.
![[Graphics:images/index_gr_21.gif]](../quicktour/images/index_gr_21.gif)
ImageData like other graphics primitives may be displayed using Show command.
![[Graphics:images/index_gr_22.gif]](../quicktour/images/index_gr_22.gif)
![[Graphics:images/index_gr_23.gif]](../quicktour/images/index_gr_23.gif)
Here is the input form of ImageData.
![[Graphics:images/index_gr_24.gif]](../quicktour/images/index_gr_24.gif)
ImageData[{{{255, 0, 0}, {0, 255, 0}, {0, 0, 255}, {255, 255, 255}, {0, 0, 0}},
{{255, 0, 0}, {0, 255, 0}, {0, 0, 255}, {255, 255, 255}, {0, 0, 0}},
{{196, 225, 242}, {96, 204, 248}, {106, 122, 211}, {219, 112, 131},
{144, 42, 55}}, {{202, 172, 172}, {111, 229, 152}, {227, 140, 232},
{175, 149, 115}, {249, 229, 3}}, {{2, 168, 101}, {176, 116, 98}, {206, 37, 113},
{185, 107, 237}, {76, 76, 45}}}, PixelInterleave -> True,
ColorFunction -> RGBColor]
The default ... This converts the image to a planar format.
![[Graphics:images/index_gr_25.gif]](../quicktour/images/index_gr_25.gif)
ImageData[{{{255, 0, 0, 255, 0}, {255, 0, 0, 255, 0}, {196, 96, 106, 219, 144},
{202, 111, 227, 175, 249}, {2, 176, 206, 185, 76}},
{{0, 255, 0, 255, 0}, {0, 255, 0, 255, 0}, {225, 204, 122, 112, 42},
{172, 229, 140, 149, 229}, {168, 116, 37, 107, 76}},
{{0, 0, 255, 255, 0}, {0, 0, 255, 255, 0}, {242, 248, 211, 131, 55},
{172, 152, 232, 115, 3}, {101, 98, 113, 237, 45}}}, PixelInterleave -> False,
ColorFunction -> {RGBColor[#1, 0, 0] & , RGBColor[0, #1, 0] & ,
RGBColor[0, 0, #1] & }]
This returns a monochrome (i.e. single channel) image.
![[Graphics:images/index_gr_26.gif]](../quicktour/images/index_gr_26.gif)
ImageData[{{85., 85., 85., 255., 0}, {85., 85., 85., 255., 0},
{221., 182.66666666666666, 146.33333333333331, 154., 80.33333333333333},
{182., 164., 199.66666666666666, 146.33333333333331, 160.33333333333331},
{90.33333333333333, 130., 118.66666666666666, 176.33333333333331,
65.66666666666666}}, PixelInterleave -> None, ColorFunction -> GrayLevel]
This converts from the default RGB format to HSV.
![[Graphics:images/index_gr_27.gif]](../quicktour/images/index_gr_27.gif)
ImageData[{{{0., 1., 1.}, {0.33333333333333337, 1., 1.},
{0.6666666666666666, 1., 1.}, {0, 0, 1.}, {0, 0, 0}},
{{0., 1., 1.}, {0.33333333333333337, 1., 1.}, {0.6666666666666666, 1., 1.},
{0, 0, 1.}, {0, 0, 0}}, {{0.5595413379513471, 0.1900826446280991,
0.9490196078431372}, {0.5453797791150686, 0.6129032258064515,
0.9725490196078431}, {0.6440842331116621, 0.49763033175355453,
0.8274509803921568}, {0.973391114828529, 0.4885844748858448,
0.8588235294117647}, {0.9813236988642904, 0.7083333333333334,
0.5647058823529412}}, {{2.371593461809983*^-9, 0.14851485148514842,
0.792156862745098}, {0.3889176340426471, 0.5152838427947598,
0.8980392156862745}, {0.8256392086959877, 0.39655172413793094,
0.9098039215686274}, {0.09556097125631627, 0.34285714285714286,
0.6862745098039216}, {0.1550068744474194, 0.9879518072289156,
0.9764705882352941}}, {{0.4343074457102995, 0.9880952380952381,
0.6588235294117647}, {0.035362577189995255, 0.44318181818181823,
0.6901960784313725}, {0.9258994821488294, 0.8203883495145632,
0.807843137254902}, {0.7682965987600818, 0.5485232067510548,
0.9294117647058824}, {0.16666666666666666, 0.4078947368421053,
0.2980392156862745}}}, PixelInterleave -> True, ColorFunction -> Hue]
User's Guide: Section 2.4.
Function Index: ImageRead, ToGrayLevel, ToHSVColor, PlanarImageData.
All image data imported into Mathematica using the ImageRead command may be displayed using Show, since ImageRead returns an ImageData object that is Show compatible.
![[Graphics:images/index_gr_28.gif]](../quicktour/images/index_gr_28.gif)
![[Graphics:images/index_gr_29.gif]](../quicktour/images/index_gr_29.gif)
Raw monochrome image data may be visualized in any number of ways, such as surface plots, contour plots and intensity plots. The latter is the most natural (see ListDensityPlot, Raster).
Here is a 32-by-32 fragment of the example image.
![[Graphics:images/index_gr_30.gif]](../quicktour/images/index_gr_30.gif)
![[Graphics:images/index_gr_31.gif]](../quicktour/images/index_gr_31.gif)
According to (3), color image data in a planar format may be treated as three separate single color images, one for each of the color channels. Here, the three channels are displayed as a GraphicsArray.
![[Graphics:images/index_gr_32.gif]](../quicktour/images/index_gr_32.gif)
![[Graphics:images/index_gr_33.gif]](../quicktour/images/index_gr_33.gif)
Alternatively, the three individual color channels may be viewed as separate monochrome images. This is a convenient conceptualization of color images allowing the simple and immediate extension of many monochrome image processing techniques to the color image domain.
User's Guide: Section 2.5.
Function Index: ImageRead, ImageTake, ToChannels, ToGrayLevel, PlanarImageData.
Image enhancement refers to any technique that improves or modifies the image data, either for puposes of subsequent visual evaluation or further numerical processing. Image enhancement techniques include gray level and contrast manipulation, noise reduction, edge sharpening, linear and non-linear filtering, magnification, pseudocoloring, and so on. One useful, broad categorization of enhancement techniques divides them into point- and region-based operations. Point operations modify pixels of an image based on the value of the pixel. These are also called zero memory operations. In contrast, region-based operations calculate a new pixel value based on the values in a local neighborhood (typically small).
Negating an image is the simplest image modification operation. This operation changes large values to small and vice versa, according to
where
is a pixel value and
for the typical 8-bits-per-pixel monochrome image. For color images the same transformation is applied to the individual color values. Here we show the effect of negation on both the color and grayscale images.
![[Graphics:images/index_gr_37.gif]](../quicktour/images/index_gr_37.gif)
![[Graphics:images/index_gr_38.gif]](../quicktour/images/index_gr_38.gif)
A common approach to contrast modification is to use a power law point transformation, where each pixel of the original image is raised to a specified exponent value. By selecting the exponent values appropriately, either high or low luminance values can be boosted. A simple, yet useful contrast manipulation technique, is to define a piecewise linear transformation to selectively stretch and/or compress a range of luminance values. The slope of the transformation is chosen greater than 1 in the region of stretch and less than 1 in the region of compression.
Examples of the effect of selected point operations on the color "beans" image are displayed below.
![[Graphics:images/index_gr_39.gif]](../quicktour/images/index_gr_39.gif)
![[Graphics:images/index_gr_40.gif]](../quicktour/images/index_gr_40.gif)
![[Graphics:images/index_gr_41.gif]](../quicktour/images/index_gr_41.gif)
![[Graphics:images/index_gr_42.gif]](../quicktour/images/index_gr_42.gif)
The image histogram is an estimate of the probability density of the image pixels. As such, it measures the frequency of occurance of the pixel luminance values. Many higher-level image processing tasks require the calculation of a histogram. Here we present the histograms of each of the three color channels in the "beans" image.
![[Graphics:images/index_gr_43.gif]](../quicktour/images/index_gr_43.gif)
![[Graphics:images/index_gr_44.gif]](../quicktour/images/index_gr_44.gif)
![[Graphics:images/index_gr_45.gif]](../quicktour/images/index_gr_45.gif)
Image equalization, or linearization, is a common image enhancement technique. Here is a linearized version of the beans iamge.
![[Graphics:images/index_gr_46.gif]](../quicktour/images/index_gr_46.gif)
![[Graphics:images/index_gr_47.gif]](../quicktour/images/index_gr_47.gif)
Amplitude thresholding is one of many segmentation techniques. Two-level or binary thresholding changes a value to 0 or 1 depending on the setting of a threshold value. Here is an image segmentation example where we extract the green beans by thresholding the individual color channels. The respective thresholds were selected from an examination of the channel histograms.
![[Graphics:images/index_gr_48.gif]](../quicktour/images/index_gr_48.gif)
![[Graphics:images/index_gr_49.gif]](../quicktour/images/index_gr_49.gif)
![[Graphics:images/index_gr_50.gif]](../quicktour/images/index_gr_50.gif)
We now find all the image regions that do not have green pixels and set them to black.
![[Graphics:images/index_gr_51.gif]](../quicktour/images/index_gr_51.gif)
Here we display the original and segmented images.
![[Graphics:images/index_gr_52.gif]](../quicktour/images/index_gr_52.gif)
![[Graphics:images/index_gr_53.gif]](../quicktour/images/index_gr_53.gif)
Further processing with morphological filters (User's Guide: Section 6.1) may be used to clean up the segmented image.
User's Guide: Section 3.2, 3.3, 3.4, 7.2.
Function Index: HistogramEqualize, ImageHistogram, PlanarImageData, RegionProcessing, ScaleLinear, Threshold, Where.
Many useful image operations are implemented with linear shift-invariant (LSI) filters. A smoothing operation is frequently a first step in operations such as noise reduction, edge detection or interpolation. A commonly used smoothing filter has constant coefficients. The effect of smoothing or blurring an image is achieved by convolving the image with such a filter. The third example will demonstrate the use of convolution to "sharpen" an image by a method called unsharp masking. The simplest form of unsharp masking may be implemented by subtracting a scaled smoothed image from the original.
Here we smooth and sharpen image the "head" example image.
![[Graphics:images/index_gr_54.gif]](../quicktour/images/index_gr_54.gif)
Here we display the three results.
![[Graphics:images/index_gr_55.gif]](../quicktour/images/index_gr_55.gif)
![[Graphics:images/index_gr_56.gif]](../quicktour/images/index_gr_56.gif)
Edge detection is an important step in many shape-based recognition tasks. Edge detection is typically implemented as a convolution operation with appropriately chosen differentiating filters. Two examples of edge detection using two common edge filters, the Sobel gradient edge detector and the Laplacian-of-Gaussian edge detector, conclude this section.
![[Graphics:images/index_gr_57.gif]](../quicktour/images/index_gr_57.gif)
![[Graphics:images/index_gr_58.gif]](../quicktour/images/index_gr_58.gif)
User's Guide: Sections 5.2, 5.3, 7.3.
Function Index: DiscreteConvolve, EdgeMagnitude, ImagePlus, LoGFilter, SobelFilter, Threshold, ZeroCrossing.
It is sometimes of interest to process a single sub-region of an image leaving other regions unchanged. This is commonly referred to as region-ot-interest (ROI) processing. Here we show an example of edge detection in a rectangular ROI using a real image. This loads the example image.
![[Graphics:images/index_gr_59.gif]](../quicktour/images/index_gr_59.gif)
Here we define a region-of-interest. Image regions may be conveniently selected using the mouse and keyboard. To select a particular region, click the displayed graphics object. In Windows (for other systems consult Input/Get Graphics Coordinates ), press the [Ctrl] key and click a point of interest. The point will be selected. This may be repeated for as many points as desired. Use the Copy and Paste command to paste the recorded list of positions to any cell or expression in the notebook.
![[Graphics:images/index_gr_60.gif]](../quicktour/images/index_gr_60.gif)
This shows the result of applying a Sobel edge detector to the region defined by roi.
![[Graphics:images/index_gr_62.gif]](../quicktour/images/index_gr_62.gif)
![[Graphics:images/index_gr_63.gif]](../quicktour/images/index_gr_63.gif)
User's Guide: Sections 5.6, 7.3.
Function Index: ImageRead, EdgeMagnitude, RegionProcessing, SobelFilter.
It is well known that LTI operators may be implemented in the Fourier transform domain, leading to computational efficiencies. The energy compaction property of transforms such as the discrete Fourier transform (DFT), discrete cosine transform (DCT) or discrete wavelet transform (DWT) plays an important role in many image/video compression techniques. Here we demonstrate image compression using the DCT transform.
Here we take the block cosine transform of the head image. The blocks are non-overlapping, of dimensions 8x8.
![[Graphics:images/index_gr_64.gif]](../quicktour/images/index_gr_64.gif)
Here we show a fragment of the head image and the DCT coefficients.
![[Graphics:images/index_gr_65.gif]](../quicktour/images/index_gr_65.gif)
![[Graphics:images/index_gr_66.gif]](../quicktour/images/index_gr_66.gif)
We now retain cosine coefficients located in a low-frequency zone of each block. We then use the inverse DCT to calculate an approximation to the original image. Here is a typical zonal mask.
![[Graphics:images/index_gr_67.gif]](../quicktour/images/index_gr_67.gif)
![[Graphics:images/index_gr_69.gif]](../quicktour/images/index_gr_69.gif)
![[Graphics:images/index_gr_70.gif]](../quicktour/images/index_gr_70.gif)
![[Graphics:images/index_gr_71.gif]](../quicktour/images/index_gr_71.gif)
The compression capabilities of the DCT are clearly visible. Using only 23% of the image's total energy, the reconstructed image is a reasonable approximation of the original. The error signal is on the farthest right. The approximation
is guaranteed to improve as the number of coefficients is increased.
User's Guide: Sections 8.4, 8.7.
Function Index: BlockProcessing, DiscreteCosineTransform, InverseDiscreteCosineTransform, RawImageData.