UPDF AI

Magician's Corner: 3. Image Wrangling.

B. Erickson

2019 · DOI: 10.1148/ryai.2019190126
2 Citations

TLDR

This article describes why one must apply these alterations and how images are represented in computers to most effectively train networks for imaging tasks, and how to handle 2D images.

Abstract

I the prior articles of this series, the images used had been specially processed to apply popular deep learning libraries (1,2). This article describes why one must apply these alterations and how images are represented in computers. Image wrangling is the manipulation of images to most effectively train networks for imaging tasks. Note that “wrangle” can mean to argue, but also to herd animals, and it is this second sense of coercing and organizing the image data into a form suitable for machine learning that is meant here. As a starting point, it should be recognized that the intensity of a pixel on an image is represented as a numerical value. The range of numerical values in the popular Joint Photographic Experts Group (JPEG) or Portable Network Graphics (PNG) format is typically from 0 to 255, which is the range that 8 binary digits (bits) permits. Medical images typically have a larger intensity range (16 bits), which allows values from 0 to 65535, or from −32768 to +32767. (There is a flag that we use to decide whether only positive or “unsigned” integers versus signed integers are used). To get color images, we use three numbers: one for each color component (red, green, and blue, or RGB). It is possible to represent images with floating point or rational numbers, and that is the common representation within deep learning libraries, and those typically range from 0 to 1.0. The reason these are not used in either JPEG or Digital Imaging and Communications in Medicine (DICOM) is they take much more space to store, require more computer power to manipulate, and without any benefit. We can store pixels in arrays: two dimensional (2D) for the typical image that we look at, but they can be represented as three dimensional (3D) if the 2D image is part of a 3D volume or a video. Most deep learning libraries today are designed for 2D images, so we must decide how we will handle each 2D image from a 3D volume or time sequence. In many cases, the handling is trivial—we just extract each 2D image and use it individually. However, in many cases, the 3D context is important for the task at hand. In those cases, we typically need to create new types of deep learning models, a task we will discuss in a subsequent article. For now, we will focus on how to handle 2D images, and most of the lessons we learn will have obvious extensions to 3D and higher dimensions. One of the most important decisions in image wrangling is how to handle intensities. Although most deep learning libraries are built to handle color images, most medical images are grayscale. However, some modalities do have multiple types, such as preand postcontrast images. In the case of MRI, we can get images that are T1-weighted, T2-weighted, postcontrast, diffusion, fluidattenuated inversion recovery, and so forth. Here, we need to wrangle (coerce and organize) the image data to make the images consistent in size and intensity so that the algorithm learns the important differences. If you understand that RGB images really just means images that have three separate information channels, you can see that we could just put each type of image into a color channel. Let’s try it! Open the Colab notebook as follows: put http://colab. research.google.com into the address bar of your browser (Chrome is recommended). Then open the ipython notebook from the RSNA github site (File > Open Notebook > Github tab) and put “RSNA” in the search bar. From the MC-ImageWrangling folder, open the notebook entitled “ImageWrangling.ipynb.” Remember to check that the runtime has GPU acceleration (Runtime > Change Runtime Type). The first cell (run cell 1) loads DICOM data selected from The Cancer Genome Atlas Glioblastoma Multiforme archive (the DICOM images are in the S1-S16.zip files). We will go through all the steps from DICOM to the machine learning in the next few cells. Cell 2 does common preparation steps: First, it converts all the images from a DICOM series to one file in the Neuroimaging Informatics Technology Initiative (NIfTI) format (https://nifti.nimh.nih.gov/). NIfTI is the most popular format for image processing. Second, cell 2 performs N4 bias correction (an artifact of MRI) (3), and finally performs 3D image registration using flirt, FMRIB’s Linear Image Registration Tool, which is an FMRIB Software Library (FSL) tool (available at https://fsl.fmrib.ox.ac.uk/fsl/ fslwiki/) that aligns one image set with another. In our case, flirt uses six degrees of freedom, but more can be specified such as scaling and skewing; for more information, see https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/. This step will take many minutes, so if you wish to skip, the prepared files can be used by executing cell 3. You can execute either cell 2 or cell 3, or both, but you must run at least one for this to work. There is also code that is commented out that uses another FSL tool called “bet2” which is the Brain Extraction Tool. BET was designed for 3D T1-weighted images and does not perform well on thick spin-echo images, and so it has been removed. It is included in comments in case you wish to try it. Cell 4 loads a text file that contains the subject ID, and the first and last slice where there is contrast-enhancing tumor. This file is a comma-separated values (CSV) file

Cited Papers
Citing Papers