Character Arrays

Overview

In this Module, we will learn about character arrays. Character arrays are useful for storing information such as the names of data or images, or as elements of a graphic user interface.

char_arrays = 'for letters and stuff'

Japenese character for love
Japenese character for love

This module is broken down into the following sections:

Learning Objectives

  • Define a Character Array.
  • Be able to assign values to character arrays using paired single quotes
  • Be able to use the function [sprintf][The sprintf function] to format character arrays
  • Be able to use functions discussed in this module like sort and unique to parse characters in a character array
  • Be able to use regular expressions to find and replace characters in character arrays

Important Terminology

  • Character Array: An array of characters (letters, spaces, punctuation, etc). Sometimes called a string.
  • ASCII: the American Standard Code for Information Interchange. A numeric code to indicate different characters.

Useful Mathworks Tutorials

Important MATLAB Functions

  • char – Convert to a character array
  • sprintf – Format data into character array
  • regexp – Regular expression
  • regexprep – Replace text using regular expression

Special MATLAB Characters

  • ‘ ‘ – paired single quotes (the straight kind)
  • [ ] – square brackets (for concatenation)

Character Arrays

font sample containing all letters from a-z
font sample containing all letters from a-z

Assignment and syntax

TOP || Assignment and syntax | Indexing | Concatenation | Padding | [Generation] Character Array Generation| Character Array Functions

A character array is simply an array in which its elements contain letters or other such characters (as opposed to containing numeric values). When creating a character array, MATLAB assumes that you probably don’t want to separate each character by a space, so the syntax for creating a character array is different from creating a numeric array:

ch = 'hello'

Instead of using the paired square brackets, you use a pair of single quotes (‘ ‘) and you do not include any spaces between characters. In fact, space is considered a special type of character known as a whitespace character.

Let’s look at the properties of the variable ch using the whos function.

>> whos('ch')

Name      Size            Bytes  Class    Attributes

  ch         1x5                10  char               


The variable ch is a vector array with one row and 5 columns that contains the letters h,e,l,l, and o in the first five elements of the array. It has a char class (character array class). And it requires 10 Bytes of memory. This means that a single character requires 2 Bytes of memory allocation (or is 16-bit).

NOTE: Examining the variable in the “Variable Editor” is a little confusing, because all the letters appear to be contained in a single element. But this is not the case. Just as in a numeric variable, each element in a character array contains a single character. MATLAB just doesn’t show this for character arrays.

NOTE THE CONVENTION: By default, character arrays are colored pink in the command window and in scripts


Indexing

TOP || Assignment and syntax | Indexing | Concatenation | Padding | [Generation] Character Array Generation| Character Array Functions

Character Arrays can be indexed just like numeric arrays using parentheses. This syntax returns the first element in a character array, which contains the letter h:

>> ch(1)
ans = 
    h

And this syntax returns the last element in the array, which contains the letter o:

>>ch(end)

ans = 
    o


Concatenation

TOP || Assignment and syntax | Indexing | Concatenation | Padding | [Generation] Character Array Generation| Character Array Functions

You can use the paired square brackets to concatenate char arrays, just as you would concatenate numeric arrays.

We can easily concatenate two character arrays using the following syntax:

>>c1 = 'together';
>>c2 = 'again';
>>ct = [c1 c2]

ct =
togetheragain

Notice, the result is literally the two character arrays, smooshed together, with no regard to grammar, or spacing. If you would like to include a space between the two merged char arrays, you need to specify that, as follows:

>>sp = ' '
>>ct2 = [c1 sp c2]
ct2 =

together again

Here we create a new character array, called sp and sandwich this character in between c1 and c2


Padding

TOP || Assignment and syntax | Indexing | Concatenation | Padding | [Generation] Character Array Generation| Character Array Functions

Just as you need an equal number of columns for every row in a numeric matrix, you need an equal number of columns for every character in a character array. If there are not enough characters in a given word, you can pad that word with spaces. Consider the phrase ‘Hello goodbye’. What happens if you try place each word in a separate row of a character array using semicolon syntax?

>>['hello'; 'goodbye']

  • You get an error: Dimensions of arrays being concatenated are not consistent.

The syntax fails because ‘hello’ has 5 characters, while ‘goodbye’ has 7. To get this syntax to properly work, you need to pad ‘hello’ with 2 trailing spaces, as follows:

['hello  '; 'goodbye']

ans =

  2×7 char array

    'hello  '
    'goodbye'

Or, really, any characters for that matter:

['hello**'; 'goodbye']

ans =

  2×7 char array

    'hello**'
    'goodbye'

Also, notice the dimensions of this new character array: 2X7

>>whos('ans')

Name        Size            Bytes  Class    Attributes

  ans         2x7                28  char               

To avoid worrying about adding the proper amount of trailing spaces, you can use the function char:

>>p = char('hello', 'goodbye')

p =

  2×7 char array

    'hello  '
    'goodbye'

The function char automatically creates a 2X7 character array, padding ‘hello’ with spaces at the end to match the length of goodbye.

Notice that in the MATLAB variable editor, character arrays are not displayed like numeric arrays. (You don’t get a spreadsheet view of each letter distributed in different elements).

If you index the last element in the first row of a character array:

p(1,end)
ans =

You get an ‘empty’ ans . Thus, the last element in the first row of p is a whitespace character.

Character Array Generation

TOP || Assignment and syntax | Indexing | Concatenation | Padding | Generation | Character Array Functions

You can automatically create a series of characters in alphabetic order as you would a series of incremental numbers using the colon.

>> b = 'a' : 'z'

b =

abcdefghijklmnopqrstuvwxyz

Or, if you would like to skip every other letter, you could use the following syntax (just like with numeric arrays):

>>b = 'a':2:'z'

b =

acegikmoqsuwy

Even numbers can be character arrays:


>> n = '1'

n =

    '1'

One indicator that you have create a character array is that MATLAB will display the output in single quotes, as shown above '1'. However, it is always a good idea to check the class using whos or check the workspace. Make sure you know the class of your array or you may get an unexpected result.

>> whos n
n = 

1

Name      Size            Bytes  Class    Attributes

  n         1x1                 2  char               

Character Array Functions

TOP || Assignment and syntax | Indexing | Concatenation | Padding | [Generation] Character Array Generation | Character Array Functions

sprintf – formats character arrays

A more sophisticated way to create character arrays that incorporates data on the fly is to use the sprintf function. sprintf allows you to incorporate data such as numeric arrays and to format that data in any fashion you would like.

To use sprintf, you first create a character array that has placeholders in them. These placeholders are prefaced by the % symbol. Some of the most common placeholders are:

  • %s – character array
  • %d – number
  • %f – floating point number

The syntax for using sprint is as follows:

FORMATTED CHAR_ARRAY = sprintf(CHAR_2_FORMAT,data)

Consider the following example.

>>input_array = 'The value of pi is %d'
>>output_array = sprintf(input_array,pi)

output_array = 
The value of pi is 3.141593e+00

In this example, input_array is the character array to be formatted. It has one placeholder: %d. This placeholder is replaced by the data found in the second input of sprintf, which in this case is the number π. The value of pi is returned by the MATLAB function pi. The use of the placeholder %d here returns the value of pi in the default format of MATLAB.

If you would like to change the way π is displayed, you can use the %f placeholder along with some formatting operators preceding the letter f, as shown by this image:

>>input_array = 'pi to the 10th significant digit is: %1.10f'
>>output_array = sprintf(input_array,pi)

output_array =
pi to the 10th significant digit is: 3.1415926536

More Placeholders, More data

For sprintf, the number of inputs depends on the number of placeholders that you add to the additional character array.

The following character array has four placeholders (3 %d’s and 1 %s’s); therefore, you need four inputs in sprintf after the input character array, as shown here:

>>input_array = 'The product of %d %s %d equals %d';
>>x = 2;
>>y = 3;
>>result = sprintf(input_array, x, 'times', y, x*y)

result =
The product of 2 times 3 equals 6

  • Notice that the last input into sprintf is actually the product of two variables.

Challenge 1

What would you change in the previous example to get the following output?

result =
The sum of 2 plus 3 equals 5


Regular Expressions

One of the most powerful functions to use with character arrays is the function regexp, which uses regular expressions to find specific characters or snippets of strings and performs some sort of operation on those characters / snippets.

Think of regular expressions like a super-charged search function that can be used to find things, such as the locations of all of the spaces in a character array or any line in text document that starts with the letter S.

Consider the following character array:

>>s = 'together at last';

We can use regexp to return the indices of all the spaces using this syntax:

idx = regexp(s,' ')

idx =
     9    12

In this call, the second input into regexp is simply a space (‘ ‘) and is the regular expression we would like to match. The variable idx contains the indices for the spaces found in the variable s.

We can use these indices as word locators because 1+idx are the locations of the start of the words at and last. We can use that information to change the characters in those locations, as follows:

>>s(idx+1) = upper(s(idx+1))

s =
together At Last

We can capitalize the first word in the character array as follows:

>>s(1) = upper(s(1))

s =
Together At Last

We can use a variant of regexp, regexprep, to replace the spaces with the indicated character array, as follows:

>>t = regexprep(s,' ','_')

Together_At_Last

Notice that regexprep accepts three inputs. The second input (' ') is the regular expression to match. The third input is the character ('_') used to replace the regular expression. In effect, we have replaced all of the spaces with the underscore character.

We can eliminate the spaces entirely using an empty pair of single quotes as the third input, as follows:

>>u = regexprep(s,' ', '')

u =
TogetherAtLast

As you can see regular expressions are an incredibly powerful way to manipulate strings.

Congratulations. MODULE Complete. yay! TOP


Challenge Answers

Challenge 1 Answer

input_array = 'The sum of %d %s %d equals %d'
x = 2
y = 3
result = sprintf(input_array, x, 'plus', y, x+y)