MathML Text Containers

Now that you get a better idea of MathML, we move focus on text containers (variables, numbers, operators, ...) which are used as building blocks of MathML formulas.

Prerequisites: Basic software installed, basic knowledge of working with files, HTML basics (study Introduction to HTML) and some CSS notions on text styling (read fundamental text and font styling and web fonts).
Objective: To get familiar with MathML elements used for writing text and be aware of special behaviors.

Unicode characters for mathematics

Mathematical formulas involve many special characters, for example greek letters (e.g. Δ), fraktur letters (e.g. 𝔄), double-struck letter (e.g. ℂ), binary operators (e.g. ≠), arrows (e.g. ⇒), integral symbols (e.g. ∮), summation symbols (e.g. ∑), logical symbols (e.g. ∀), fences (e.g. ⌊) and many more. Wikipedia's article Mathematical operators and symbols in Unicode provides a good overview of the characters used.

Since most of these characters are not part of Basic Latin Unicode block, it is recommended to specify your document's character encoding and to serve it with appropriate web fonts. Here is a basic template to use UTF-8 encoding and the Latin Modern Math font:

html
<!doctype html>
<html lang="en-US">
  <head>
    <meta charset="utf-8" />
    <title>My page with math characters</title>
    <link
      rel="stylesheet"
      href="https://fred-wang.github.io/MathFonts/LatinModern/mathfonts.css" />
  </head>
  <body>
    <p style="font-family: Latin Modern Math">∀A∊𝔰𝔩(n,𝔽),TrA=0</p>
  </body>
</html>

A bit of semantics

We noticed in the getting started with MathML article that the text in MathML formulas are wrapped in specific container elements such as the <mn> or <mo>. More generally, every text in MathML formulas must be included inside such container elements, called token elements. In addition, MathML provides multiple token elements in order to distinguish different meanings of the text content:

  • The <mi> element, which represents an "identifier" which could be a symbolic name or arbitrary text. Examples: <mi>x</mi> (variable), <mi>cos</mi> (function name) and <mi>π</mi> (symbolic constant).
  • The <mn> element represents a "numeric literal" or other data that should be rendered as a numeric literal. Examples: <mn>2</mn> (integer), <mn>0.123</mn> (decimal number) or <mn>0xFFEF</mn> (hexadecimal value).
  • The <mo> element represents an operator or anything that should be rendered as an operator. For example <mo>+</mo> (binary operation), <mo>≤</mo> (binary relation), <mo>∑</mo> (summation symbol) or <mo>[</mo> (fence).
  • The <mtext> element is used to represent arbitrary text. For example short words in formulas such as <mtext>if<mtext> or <mtext>maps to</mtext>.

Active learning: recognize token elements

Below is a more complex example, which says that the absolute value of a real number is equal to that number if and only if it is nonnegative. Spot the different token elements and what they are used for. Each time you click the corresponding text, it is highlighted and a confirmation message is displayed.

Finally, read the MathML source to verify whether that corresponds to your expectation:

xml
<math display="block">
  <mrow>
    <mrow>
      <mo>|</mo>
      <mi>x</mi>
      <mo>|</mo>
    </mrow>
    <mo>=</mo>
    <mi>x</mi>
  </mrow>
  <mtext>&nbsp;iff&nbsp;</mtext>
  <mrow>
    <mi>x</mi>
    <mo></mo>
    <mn>0</mn>
  </mrow>
</math>

Note: It is sometimes difficult to decide the token element to use for a given text content. In practice, choosing the wrong element should not cause major issues because all token elements are generally rendered the same by browser implementations (for visual display and for assistive technologies). However, the <mi> and <mo> elements have special distinguishing features that one should be aware of. They are explained in the following sections.

Automatic italicization of <mi>

One typographic convention in mathematics is to use italic letters for variables. In order to help with that, <mi> elements with a single character may be automatically rendered as italic. This is the case for all the letters from the latin and greek alphabets. Compare the rendering of the two <mi> elements in the following formula:

html
<math>
  <mi>sin</mi>
  <mi>x</mi>
</math>

Note: This table from MathML Core provide the exhaustive list of characters that are subject to italicization, together with the corresponding italic characters.

Reverting automatic italicization of <mi>

In order to revert this default italic transformation you can attach a mathvariant="normal" attribute on the <mi> element. Compare the rendering of the uppercase gamma letters in the following formula:

html
<math>
  <mi>Γ</mi>
  <mi mathvariant="normal">Γ</mi>
</math>

Note: Although you can apply this transformation, normally you'd just use the desired Mathematical Alphanumeric Symbols.

Operator properties of <mo>

MathML contains an operator dictionary that defines default properties of <mo> elements depending on their content and the position within its container (prefix, infix or postfix). Let's consider a concrete example:

html
<table>
  <tr>
    <td>Prefix plus</td>
    <td>
      <math>
        <mo>+</mo>
        <mi>i</mi>
      </math>
    </td>
  </tr>
  <tr>
    <td>Infix plus</td>
    <td>
      <math>
        <mi>j</mi>
        <mo>+</mo>
        <mi>i</mi>
      </math>
    </td>
  </tr>
  <tr>
    <td>Prefix sum</td>
    <td>
      <math>
        <mo></mo>
        <mi>i</mi>
      </math>
    </td>
  </tr>
</table>

This example should render similarly to the screenshot below. Observe the spacing between the <mi>i</mi> elements and its preceding <mo>: no spacing for the prefix plus, some spacing for the infix plus and some smaller spacing for the prefix summation symbol.

Screenshot of the MathML formula with different operator spacing

Operators have many other properties that we will see in more details later. For now, remember to use an <mo> container for characters in the operator dictionary and to properly group subexpressions with <mrow> elements in order to help MathML renderers.

Active learning: spot the difference

Now that you are a bit familiar with special features of <mi> and <mo>, let's rewrite the <p> element in the example at the top of the page with some actual MathML. Compare the visual rendering in your browser and explain the differences with the text-only version.

html
<!doctype html>
<html lang="en-US">
  <head>
    <meta charset="utf-8" />
    <title>My page with math characters</title>
    <link
      rel="stylesheet"
      href="https://fred-wang.github.io/MathFonts/LatinModern/mathfonts.css" />
  </head>
  <body>
    <p style="font-family: Latin Modern Math">∀A∊𝔰𝔩(n,𝔽),TrA=0</p>
    <p>
      <math>
        <mo></mo>
        <mrow>
          <mi>A</mi>
          <mo></mo>
          <mrow>
            <mi>𝔰𝔩</mi>
            <mrow>
              <mo>(</mo>
              <mi>n</mi>
              <mo>,</mo>
              <mi>𝔽</mi>
              <mo>)</mo>
            </mrow>
          </mrow>
        </mrow>
        <mo>,</mo>
        <mrow>
          <mrow>
            <mi>Tr</mi>
            <mi>A</mi>
          </mrow>
          <mo>=</mo>
          <mn>0</mn>
        </mrow>
      </math>
    </p>
    <input id="showSolution" type="button" value="Show solution" />
    <div id="solution"></div>
  </body>
</html>

Note: An obvious difference is that the source code became much more verbose with MathML. Recall that this tutorial is about learning the language but in practice MathML content is generally not written manually. See the Authoring MathML page for more information.

Active learning: stretchy operators

The operator dictionary defines a default stretchy property as well as corresponding stretch axis for some operators. For example, an operator can stretch vertically by default to cover the maximum height of non-stretchy siblings within its <mrow> container. By tweaking a bit the previous exercise, one can make operators stretch vertically. Can you find them?

As usual, you are invited to read the source code when you are done:

xml
<math display="block">
  <mrow>
    <mrow>
      <mo>|</mo>
      <mfrac>
        <mn>1</mn>
        <mi>x</mi>
      </mfrac>
      <mo>|</mo>
    </mrow>
    <mo>=</mo>
    <mfrac>
      <mn>1</mn>
      <mrow>
        <mo>|</mo>
        <mi>x</mi>
        <mo>|</mo>
      </mrow>
    </mfrac>
    <mo>=</mo>
    <mfrac>
      <mn>1</mn>
      <mi>x</mi>
    </mfrac>
  </mrow>
  <mtext>&nbsp;iff&nbsp;</mtext>
  <mrow>
    <mi>x</mi>
    <mo></mo>
    <mn>0</mn>
  </mrow>
</math>

Warning: Special math fonts are generally required to make that stretching possible, the previous example relies on web fonts.

Summary

In this article, we have learnt about a few token elements that are used as text containers as well as their different semantics, namely <mi> (identifier), <mn> (numbers), <mo> (operators), <mtext> (generic text). We have seen special Unicode characters that are commonly found in math formulas and given an overview of some observable behaviors of the <mi> and <mo> elements. In the next article, we will see how to rely on token elements to build much complex expressions such as fractions and roots.

See also