Cell Arrays

Overview

burma_cells

The cell array is the grab bag of variable types. It was designed to store disparate information, such as character and numeric information, side by side, element by element in the same array. Importantly, this disparate information doesn’t even need to be the same size, making cell arrays useful for organizing character arrays or other arrays of unpredictable dimensions.

This module is broken down into the following sections:

Learning Objectives

After finishing this module, you should be able to:

  • Explain why you would use a cell array.
  • Assign values to the elements of a cell array using the {} or () special characters
  • Correctly index a cell array to access its contents
  • Describe the difference between indexing cell arrays and numeric arrays
  • Correctly input cell arrays into functions

Important Terminology

  • Cell Array: a special type of array in which you can place whole other arrays in each element of the cell array

Important MATLAB Functions

  • cellfun – apply the same function to each element of a cell array
  • cell2mat – convert cell arrays to numeric arrays

Special MATLAB Characters

  • { } – curly brackets
  • ( ) – parentheses

Why Cell Arrays?

The Trouble With the Fundamental Classes

All right. So we already figured out how to store numbers, characters, and booleans in arrays (review the modules on the Fundamental Data types if you don’t know what I’m talking about). Recall that in these fundamental classes, each element of the array contains a single piece of information, either a number, a character, or a true/false, depending on the class type

These three variable types hold pretty much any type of information that we need. So, why do we need to learn about other variable types? Well, while very useful, the fundamental data types have certain restrictions that at times can become onerous for use at scale.

Recall our example for storing rows of characters in a character array. To have more than one row of characters, you need to pad character arrays with empty spaces, as follows:

>> common_names_1995 = char('Jessica', 'Ashley', 'Emily', 'Samantha')
common_names_1995 =

  4×8 char array

    'Jessica '
    'Ashley  '
    'Emily   '
    'Samantha'

Fortunately, we have the function char to manage the padding, but we need to run this function every time we want to add a new row to the character array, and that can get computationally burdensome (a fancy way of saying slow). Not to mention the fact that artificially adding extra spaces to each row of a character array can make things tricky if you need to know the length of each character array. Say, for example, you wanted to know the average length of popular names in 1995. You would have to come up with a method to exclude those extraneous spaces in your count. This is certainly doable, but adds unnecessary steps to the procedure.

Cell Arrays to the Rescue

Cell arrays solve these problems by obviating the need of having a single value (number or character or boolean) in each element of the array. Instead, you can have a whole array in a single element of a cell array. Thus, each element of a cell array is like a box into which you can stuff other arrays, even other cell arrays. This is very useful for organizing information, but it can be cumbersome for other activities – like trying to get information back out of a cell array. Simply said: Indexing elements in a cell array can be a pain.

Cell Array Assignment

TOP | Cell Array Assignment | Cell Indexing | Cell Arrays and Functions

You create cell arrays using the special character Curly Brackets: { }.

For example, to create the previous array as a cell array, you would use the following syntax:

>> names90s = {'Jessica'; 'Ashley'; 'Emily'; 'Samantha'}

names90s =

  4×1 cell array

    {'Jessica' }
    {'Ashley'  }
    {'Emily'   }
    {'Samantha'}

names90s is a 4×1 cell array. Each element in names90s contains a character array. Notice that we did not need to add any extraneous spaces to any of the character arrays. Also notice that we used the curly brackets as a concatenating implement, similar to the way we used square brackets for numeric arrays. Much of the rest of the syntax should look familiar: Inside the curly brackets, we used the single-quotes to concatenate the character arrays that we wanted packaged into the elements of the cell array. We also used semi-colons to indicate new rows in the cell array.

There are some subtleties when creating a cell array. Consider the difference between the following syntaxes. Here…

>> {'a','b','c','d'}

ans =

  1×4 cell array

    {'a'}    {'b'}    {'c'}    {'d'}

  • …there are 4 separate character arrays inside the curly brackets (a,b,c, and d). Thus, the syntax returns a 1X4 cell array with a single character packaged in each element of the cell array.

Whereas here…

>> {'abcd'}

ans =

  1×1 cell array

    {'abcd'}

  • …there is only 1 character array (abcd) inside the curly brackets, which is then packaged into a single element of the resultant cell array.

Challenge

  1. What size cell array would the following return?
  2. What is found inside each element of the array and what are the dimensions of those contents?
{'abcd' 'ef'}

ANSWER


Mix and Match Variable Types

One nice feature of cell arrays is that you can store different variable types in the same array.

For example, to add a column vector with numeric values to our cell array names90s, we first create a cell array with numeric values in it as follows:

{1; 2; 3; 4}

ans =

  4×1 cell array

    {[1]}
    {[2]}
    {[3]}
    {[4]}

  • By using the curly instead of the square brackets to concatenate the numeric values, we create a cell array instead of a numeric array. This syntax packages the numeric values into separate elements of the cell array. We still use semi-colons to indicate new rows in the cell array, just as we would for a numeric array.

Next, to add this column vector to names90s, we use the following syntax:

>> names90s = [{1; 2; 3; 4} names90s]

names90s =

  4×2 cell array

    {[1]}    {'Jessica' }
    {[2]}    {'Ashley'  }
    {[3]}    {'Emily'   }
    {[4]}    {'Samantha'}

Let’s unpack this syntax:

  1. Square brackets are used to concatenate arrays of the same type and dimension. In this case, we are simply concatenating two cell arrays: one a new cell array, generated on the fly, with numbers in it, and the other an already generated cell array with names in it (names90s). Notice that both cell arrays inside the square brackets have the same dimensions.
  2. The curly brackets are used to create a new cell array with numbers packaged into separate cell elements, as demonstrated in the previous example.
  3. We use recursive assignment to prepend the new cell vector from step 2 to the cell array names90s. Recursive assignment simply means to include the same variable name on both sides of the assignment operator (=). By adding the variable name names90s on the right side of the = operator, and after the cell vector, we are indicating that we first want to add this new cell vector in front of the contents of names90s and then we want overwrite names90s with this enlarged version.

SIDEBAR: Important Use-cases for Square and Curly Bracket Concatenation.

Square brackets [ ] are used to concatenate variables of the same type and dimension. Using square brackets will return the same variable type that you started with. Curly brackets { }, on the other hand, are used to create cell arrays. Any array found inside the curly brackets will be added to a new element of the cell array. By using curly brackets, you can concatenate disparate variables and variable types into a single cell array.


Concatenating Variables With Dimensional Disparities

Another powerful use for cell arrays is to concatenate variables that have different dimensions. This is especially useful when you can’t easily predict the dimensions of the variables that you are trying to concatenate. We already saw an example of this when we concatenated character arrays of different lengths into the elements of a cell array (Recall the first iteration of names90s, with just the names in it).

We can also concatenate numeric arrays with different dimensions. For example, say we want to add a column to names90s that contains all of the years in the 1990’s in which the corresponding name was the number one name.

We can easily create a column vector that contains varying number of years, as follows:

>> {[1990, 1993, 1994, 1995]; [1991, 1992]; [1996, 1997, 1998,1999]; []}

ans =

  4×1 cell array

    {1×4 double}
    {1×2 double}
    {1×4 double}
    {0×0 double}

Notice that the number of years varies by element. Also, notice that the contents of this cell column vector are all row vectors. We could have just as easily made them column vectors, too. Or half could have been row and half column vectors. The contents of each elements of a cell array do not need to match each other in any size, shape, or form. Although, it’s probably easier if you do match them as close as possible.

Let’s break this syntax down in detail:

  1. The outer most brackets are the curly brackets, which, as we already know, generates a cell array. Since we didn’t assign a variable name, this cell array is assigned to ans.
  2. Inside these curly brackets, there are four pairs of square brackets. Each pair of square brackets generates a numeric array, which is then packaged into one element of the cell array.
  3. The square brackets are separated by semi-colons to indicate a new row in the cell array.
  4. Finally, the last element in this cell array contains an empty numeric array, which is a thing you can do in cell arrays. You can leave an element in a cell array empty. This is particularly useful in this example as there were no years in the 90’s in which Samantha was the Number 1 Name (Still a lovely name). But, even though there were no years to add, we still had to have a fourth row because names90s has 4 rows, and the goal is to concatenate this array with names90s.

Next, to concatenate this cell array to names90s, we use a combination of square and curly brackets and recursive assignment. Remember, square brackets are used to concatenate variables of the same type and dimensions, so in this syntax, they need to be the outermost brackets. And inside these brackets, there must be multiple cell arrays with the same dimensions, as follows:

>> names90s = [names90s {[1990, 1993, 1994, 1995]; [1991, 1992]; [1996, 1997, 1998,1999]; []}]

names90s =

  4×3 cell array

    {[1]}    {'Jessica' }    {1×4 double}
    {[2]}    {'Ashley'  }    {1×2 double}
    {[3]}    {'Emily'   }    {1×4 double}
    {[4]}    {'Samantha'}    {0×0 double}

Bam. We now have a third column in names90s.


A further note on recursive assignment: By the way, in this recursive assignment, did you notice the location of names90s on right-hand side of the = sign? It is found to the left of the new cell column being added. This syntax indicates that the new column should be added after the contents found in names90s, in the third column position. In the previous recursive assignment, names90s was to the right of the cell column vector being added, which is why that cell array was added before the contents in names90s, in the first column location.


Cell Indexing

TOP | Cell Array Assignment | Cell Indexing | Cell Arrays and Functions

Ok, so we can now get stuff into cell arrays. How do we get them out? Well, that depends on what you need.

Parenthetical Indexing

If you simply want a smaller cell array, then you would use the parentheses ( ) as follows:

>> names90s(:,3)

ans =

  4×1 cell array

    {1×4 double}
    {1×2 double}
    {1×4 double}
    {0×0 double}

  • This syntax returns the third column from names90s as a smaller cell array. This should feel familiar. This is how we index fundamental data types, which also returns smaller versions of the original array.

So, how would you return the third row from names90s as a cell array?

>> names90s(3,:)

ans =

  1×3 cell array

    {[3]}    {'Emily'}    {1×4 double}

  • Just as you would expect. Notice that the command window sometimes shows what’s in each element and sometimes doesn’t, depending on the size of the contents in the array.

Curly Bracket Indexing

If you want to extract the contents from inside the elements of a cell array, then you need the curly brackets. This is easy and intuitive for one element:

> names90s{2,2}

ans =

    'Ashley'

  • Notice that this syntax returns not a smaller cell array but the contents and type from inside the indicated element: in this case a character array containing the characters Ashley.

Indexing, unfortunately, is not quite so intuitive for multiple elements…

Complex Indexing

The trouble with Curly Bracket indexing

Which brings us to one of the problems with cell arrays: there is no standard syntax to index out the contents from multiple elements in a cell array and concatenate those elements into a new array of a different type.

We can pretty easily extract the contents from multiple elements. The following syntax extracts the contents from the elements found in the first column of the cell array:

>> names90s{:,1}

ans =

     1

ans =

     2

ans =

     3

ans =

     4

  • Notice that this syntax returns what is know as a “comma-separated list” (even though there are no commas), which simply means that the contents from each element are spit out, one after the other, into the workspace. In this case, the contents overwrite ans over and over so that final value of ans is simply 4. Such an output makes sense from a general point of view, as there is no guarantee that the contents from a given element in a cell array can be concatenated with the contents from any other elements of the same cell array.

In this case, since the outputs are all numbers, we can simply use the square brackets to concatenate the numbers, as follows:

>> [names90s{:,1}]

ans =

     1     2     3     4

  • Which works, but the syntax is a little convoluted and the result is a row vector instead of the column vector found in names90s.

By comparison, there is no simple way to concatenate the contents from the third row of the cell array (without creating another cell array) as they are of multiple types (both numeric and character):

>> names90s{3,:}

ans =

     3


ans =

    'Emily'


ans =

        1996        1997        1998        1999

And, concatenating the contents from columns 2 and 3 in names90s return unexpected and perhaps unwanted results:

>> [names90s{:,2}]

ans =

    'JessicaAshleyEmilySamantha'

or

>> [names90s{:,3}]

ans =

        1990        1993        1994        1995        1991        1992        1996        1997        1998        1999

So, as you can see, there are lots of issues with indexing cell arrays. As we’ll see in later modules, there are different complex variable types to help tackle these issues.


The Main take-away for indexing cell arrays is that indexing with ( ) returns smaller cell arrays and with { } returns the contents from the elements inside the cell arrays, sometimes in unexpected ways. Always keep these two different types of syntaxes in mind when indexing cell arrays. Cell array indexing can be a major source of bugs!


Cell Arrays and Functions

TOP | Cell Array Assignment | Cell Indexing | Cell Arrays and Functions

Considering that indexing cell arrays is such a pain, you need to take care when inputing cell arrays into functions. This is another major potential source of bugs. Many functions expect the contents from elements in the cell array, not the cell itself.

Improper Inputting of cells into Functions

Say, for example, we want to programmatically find the most popular girl’s name across the entire 1990s decade. One way to calculate this would be to simply to count the number of years in the 90s in which that name was the number one name. We already have a column in names90s where these values are stored (the third column). So, the process of finding the most popular girl name would look something like this:

  1. Extract the contents from each row in names90s
  2. Count the number of years stored in a given row
  3. Identify which row has the highest count
  4. Crown the row with the highest count the most popular girl’s name for the 1990s

If we simply look in the workspace, we can see that there are actually two names that rocked the 90’s: Jessica and Emily, each reigning supreme for four years in the 90s. But, if we wanted MATLAB to figure this out programmatically, which is something we would want to do if we had a lot more rows, then we need to input the elements from names90s into a function that would do the counting for us and then store those counts in a new array.

The functions size or numel return array dimensions or element counts, respectively. However, when we apply the numel function to the third column of names90s, we get the following:

>> numel(names90s(:,3))

ans =

     4   

  • Which is just the number of elements in the third column of cell array itself (4 rows)—not what we wanted.

We want to count the number of elements of the arrays inside each cell element. So, our natural instinct would be to extract the contents from the elements using the {} brackets. However…

>> numel(names90s{:,})
 numel(names90s{:,})
              ↑
Error: Invalid expression. When calling a function or indexing a variable, use parentheses. Otherwise, check for mismatched delimiters.

  • …that doesn’t work. And the reason it doesn’t work is because using the {} syntax returns a comma-separated list, spitting out the contents from each element, one after the other, without organizing them in any sort of manner. The function numel expects a single, nice and orderly array. If we concatenate with the [] brackets after extracting with the {} brackets, then we simply create a single row vector, with all of the years extracted from all the elements, as we did in the previous section; however, we then lose track of which years correspond to which name.

So, one solution would be to painstakingly extract the contents from each element in names90s and then input the extracted contents into numel, as follows:

>> count1 = numel(names90s{1,3})

count1 =

     4
>> count2 = numel(names90s{2,3})

count2 =

     2

And then compare all of the new variables. Note that count1 corresponds to Jessica, our early 90’s most popular name.

But there is also a better way, and the function cellfun is one of them:

cellfun – Putting the Fun Back in Cell Arrays

The function cellfun was built to methodically extract the contents from cell arrays, perform an operation on those contents, and then return the output from that operation

The syntax is as follows:

>> counts = cellfun(@numel, names90s(:,3))

counts =

     4
     2
     4
     0

Perfect, we see that two rows in counts have a count of 4, and those rows correspond to the names Jessica and Emily. We also see that the last row has a count of 0, which makes sense because Samantha was never a number one name in the 90s (despite being such a lovely name) and that cell element was empty.

Let’s unpack the syntax a little further:

  1. The first input into cellfun is the handle to the function numel. A handle is simply the function name preceded by the @ symbol. This simply directs cellfun to use the function numel as an input, instead of trying to executing that the function and using the output from that function as an input.
  2. The second input into cellfun is the third column from names90s, as a cell vector
  3. The output from cellfun, counts is a simple numeric array that concatenates the output from numel as it performs its operation on the elements from each row in names90s.

Note, the syntax also works on the second column of names90s

>> name_length = cellfun(@numel, names90s(:,2))

name_length =

     7
     6
     5
     8

  • Which returns the number of letters in each of the names found in that column

Searching Cell Arrays: strfun

The function strfun searches the elements of cell arrays for specified character arrays. This is really useful when you want to find a character array that has been stashed away in a cell array. For example, to find which element the character array Samanta is found, you use the following syntax:

>> strcmp(names90s,'Samantha')

ans =

  4×3 logical array

   0   0   0
   0   0   0
   0   0   0
   0   1   0

  • **strcmp** returns a logical array with a true wherever the array contains the specified character array. In this case, the true is in the last row, second column of the names90s array, where ‘Samantha’ is stored.

cell2mat and other conversion functions

There are also functions that help convert cell arrays to different cell types. One of the most useful is cell2mat which converts cell arrays to numeric arrays. We can apply cell2mat to the first column in names90s as follows:

>> ranking = cell2mat(names90s(:,1))

ranking =

     1
     2
     3
     4

  • This syntax only works for numeric arrays that can easily be concatenated. (This would not work on the third column of names90s because those arrays are all different sizes). Also notice that this syntax returns a column vector that matches the dimensionality of the numbers found in the cell array. Unlike the previous methods using [] brackets, which returned a row vector.

You can find a whole list of cell-related functions here.

Post-mortem

  1. Cell arrays are great for organizing disparate information of different dimensions
  2. Cell arrays are less great for getting that disparate information back out again.
  3. Watch what you put into functions when cell arrays are involved

FIN

Challenge Answers

Challenge 1

>> {'abcd' 'ef'}

ans =

  1×2 cell array

    {'abcd'}    {'ef'}

  1. What size cell array would the following return? A 1X2 cell array: One row, two columns
  2. What is found inside each element of the array character arrays
  3. What are the dimensions of those contents? 1x4 character array and a 1x2 character array

back