*Compellingly Beautiful Software*

Products Programming Contact

- Start at the top level of Aunt Sally's operator precedence table.
- Scan your expression from left to right, looking for all elements involving things at that level, and perform those operations.
- Move to the next level of the table, looking for and performing those operations.
- Continue this process until a single result remains.
- Stepping from { to + moves up in the table, so take that step, but don't do anything yet.
- Stepping from + to * moves up in the table, so take that step, but don't do anything yet.
- We can't step from * to - because that doesn't move up in the table, so perform the multiplication, replacing 3*4 with 12.
- The * isn't there anymore, so we slip back to the +.
- We can't step from + to - because that doesn't move up in the table, so perform the addition, replacing 2+12 with 14.
- The + isn't there anymore, so we slip back to the {.
- Stepping from { to - moves up in the table, so take that step, but don't do anything yet.
- We can't step from - to } because that doesn't move up in the table, so perform the subtraction, replacing 14-5 with 9.
- The - isn't there anymore, so we slip back to the {.
- Stepping from { to } moves up in the table, and this particular
step signals that evaluation is complete.

- Stepping to a ( is always a step up.
- Stepping from a ( is always a step up.
- Stepping to a ) is always a step down.
- Stepping from a ) is always a step down.

*"Life is like a box of chocolates: you never know
what you're going to get."*

*
- Forrest Gump*

Programmers hate Forrest Gump's proverbial box of chocolates:
we *need* to know what we're going to get.
One of the first steps toward this goal is to agree on what things mean
when we say them to each other.

By junior high school, or sometimes earlier,
students learn about the standard *operator precedence hierarchy*
used for arithmetic and algebraic expressions.
For example, we all know that

**a+b ^{c}d**

means "*Raise* **b** *to the* **c**
*power, multiply the result by*
**d***, and add ***a** *to that*."
Your teachers taught you this, and expected you to learn it.
If you're a professional programmer, you probably learned this quite well.

One of the standard tricks used to help students is to provide them with mnemonic devices, such as

**P**lease

**E**xcuse

**M**y **D**ear

**A**unt **S**ally

indicating that things inside **P**arentheses
are done before **E**xponentiation,
which are done before **M**ultiplication
and **D**ivision,
which are done before **A**ddition
and **S**ubtraction.
Of course,
this left you wondering what behavior we are excusing Aunt Sally for,
but eventually you encountered **F**unctions,
which perhaps gave you some idea,
resulting in mnemonics you couldn't say in class.

But there are really two parts to this Aunt Sally business. There is
the Aunt Sally *table* shown above,
and then
there is *the process for using it*.

The Aunt Sally *process *is
usually described something like this:

We need a name for this process.
We might reasonably call it the *multi-pass parsing algorithm*.
But, for the remainder of this discussion, I'll simply refer to it as
*the old way*.
As we all know, to really foul things up requires a computer,
which will eventually lead us to
*the new way*.

People who *design* computer programming languages
discover that there are lots of choices to be made.
These languages are usually designed
by people with strong mathematics backgrounds.
It's not too surprising, then,
to see something like Aunt Sally popping up in most of them.

One design constraint of most programming
languages is that one can write them as a linear sequence of
the characters found on standard typewriter keyboards. Another common
feature of these languages is multi-character variable names.
The first of these constraints makes exponents-as-superscripts infeasible,
and the second makes implicit multiplication a really bad idea.
But the resulting scripts resemble standard mathematics closely enough
that one can learn to read most of them fairly quickly. In most languages,
***** and **/** represent multiplication and
division, and in some languages, **^**
is used as an explicit exponentiation operator. We will use
these symbols for the remainder of this discussion. If we render the
Aunt Sally table in operator symbols, it looks like this:

()

^

* /

+ -

But what about the *process *of using the Aunt Sally table?
People who *implement* programming languages have many choices to make,
also, and one of the first discoveries one makes
is that the algorithmic processing of language is harder than it looks.
For example, implementing *the old way*,
the multi-pass algorithm described a few paragraphs back,
as a computer program in its own right is a really ugly task.

As far as I am aware,
** no actual programming language implementation
uses the multi-pass parsing algorithm taught in math classes**.

You're probably wondering if there's any difference in the results
of these two algorithms. Before we answer this question, let's take a
brief look at the nature of mathematics and of programming, which are
certainly related, but are not really the same thing.

Most of mathematics is *stateless*.
If, for example, you're given an algebraic statement to solve for *x*,
that statement is a *constraint*,
and your job is to find the numbers you can use in place of *x*
that satisfy that constraint.
Solving a math problem may involve many intermediate expressions,
but these are incidental to the solution.
You may use some definite process (that is, an algorithm)
for solving the constraint,
but *nothing about the constraint is changing as you are solving it*,
even if you're changing its visible form according to algebraic rules.
This is the reason for learning the rules of algebra:
so that we can safely transform these constraints into successively
simpler
but nevertheless equivalent forms.

Much of programming, on the other hand, is *about state*,
and expressing a precise order of changing states
is what most programmers do for a living.
For this reason,
it's important to understand exactly what your programming language
*implementation *is doing with the language you're writing.

Now let's answer your question about the equivalence of the
*old* way and the *new* way. We actually need two
answers, one for mathematics, and one for programming. Do these two
evaluation algorithms yield the same results?

In mathematics: **always**.

In programming: ** most of the time**.

Are you nervous yet?

Because most programming is not stateless,
some expressions in computer programs yield different results
when these two ways of using the Aunt Sally table are applied.
Several times in my long career in software development,
colleagues have come to me with program bugs that were incomprehensible to them.
They'd been over their code time after time,
trying to find whatever was causing results
that simply defied their carefully-analyzed expectations.
By the time they came to me,
they usually had concluded they'd found a bug in the language implementation.
But the problem turned out to be
their use of *the old way* of using the Aunt Sally table,
which they had learned in junior high school,
*and which had been restated for them
by their college professors in programming classes*.

This has to change. Computer programs are not supposed to be like Forrest Gump's box of chocolates. When we write a piece of code, we need to know what we're going to get.

So what is this *new way*?
It's really very simple,
although it's different enough from the old way
that it may take you a little while to get used to it.

*The old way* is based on scanning the entire expression several times,
looking for operators with successively lower precedence in each pass.
*The new way* is focused on the question
"*How does the next operator relate to the one I'm looking at right now?*"
As an example, let's look at

**{** 2 + 3 * 4 - 5 **}**

Note that I've introduced a couple of symbols you don't usually see
in expressions: the curly braces at the beginning and end. These are
here solely for the purpose of explaining *the new way*.
Once you understand it, you won't need them anymore. Meanwhile, we'll
add them to the Aunt Sally table:

^

* /

+ -

**}**

**{**

In the new algorithm, we're going to step from one operator to the
next, starting with the "**{**",
which you can think of as sort-of-an operator.
Always keep in mind the notion of *where
you're standing now*. In order
to make a step, we have to move *up *in
the precedence table. Let's evaluate `{2+3*4-5}`.

In the example we've just completed, the order of operations is the same whether we use the new way or the old way. Here's an example where it isn't:

`
{
2 * 3 + 4 * 5 + 6 * 7
}`

A B C D E

Using *the old way*, the order of operations is ACEBD.
Using *the new way*, the order of operations is ACBED,
but the final result is the same.
In stateless mathematics, the final result is *always* the same.
But when you encounter state-dependent expressions
in computer programs, the results can be different.

You may have noticed that parentheses didn't appear
in the augmented table shown above.
Parentheses can be included in such a table,
but it becomes a somewhat more complicated table.
Once you grasp the idea of *always stepping up*,
it's easier to handle parentheses using these explicit rules:

What does it mean to "evaluate" a right parenthesis? It means removing itself and its corresponding left parenthesis from the expression, leaving the intermediate operand they enclosed in place.

What about functions and unary sign operators? We'll add them to the
table in just the way you would expect:

` functions
unary sign operators
^
* /
+ -`

Some programming languages have precedence tables that are much deeper
than this, for example, C, C++, and Java. The Smalltalk programming
language has a very shallow precedence hierarchy for its very different
syntax. But *the new way* presented here works for all of these
languages and a great many more.

There is one remaining question that is unresolved. The meaning of
**a^b^c** has never been entirely clear,
depending on how ^ associates. If it associates to the left,
then **a^b^c** means **(a^b)^c**,
which is equivalent to **a^(b*c)**. This is a less strenuous
operation than the right association **a^(b^c)**.
Some people claim that, if one means the simpler thing, one should
write the simpler thing, but there continues to be no general
agreement. Because of this, if the language you are using has an
operator for exponentiation, you need to ask what the associativity of
that operator is for your implementation.
If it is right-associative,
then stepping from ^ to ^ is always a step up in *the new way*.
For the remainder of this discussion, however, you may
assume that exponentiation is nothing special and associates to the left.

Here is a Java applet that demonstrates *the new way*.
Enter an expression in the text field at the top.
As you enter it, it will be incrementally evaluated
by using the Aunt Sally table *the new way*.
You will also see a record of the expression you have entered,
and a display of the parser's postfix output,
which shows you the order in which things were actually evaluated.
Postfix is a notation similar to the keystroke sequence
of a Hewlett-Packard RPN calculator.
The usual unary and binary arithmetic operators and parentheses are implemented,
but functions, which require more advanced lexical analysis, are not.
Integers and decimal fractions are supported,
but numeric entry in scientific notation, for example, 6.02E23, is not.

This implementation is intended to be error-proof. If you do something that is syntactically illegal, you'll hear a beep and see an error message.

To end an expression, use the return key, *not the = key*.
As a convenience,
if you have one or more open subexpressions and are otherwise at the end
of your expression, such as
`2^(3*(4+5`,
you can automatically close all subexpressions and end the expression
simply by hitting return. The correct number of right parentheses
will be filled in for you.

An interesting feature of this implementation is the
invisible *space *operator.
Entering a space character evaluates the rightmost pending operator. It
lets you see intermediate operands in an expression such as
`2+3*4^5` that would otherwise
appear to evaluate all at once. It does not change the meaning of an
expression, *nor will it let you
change the meaning of an expression*. For example, if you enter
`2+3*4` followed by two space characters,
you will see 14 in the display.
At this point, the next operator must have a precedence that is
*no higher than +*, which was the last operator evaluated.
Allowing a higher-precedence operator at this point
would change the meaning of the expression,
so you'll see an error message if you try to do this.
Under no circumstances will the *space *operator
penetrate an unmatched left parenthesis.

Here's another interesting feature: try hitting the backspace key. This
implementation is *completely reversible*,
all the way back to the beginning of an expression,
including the effects of the *space* operator.

*The old way* of evaluating expressions
works well and is probably preferable for stateless mathematics,
but it is inadequate for understanding real-world computer programs.
If you are a professional programmer,
now is a good time to learn *the new way*.
If you are a teacher of programming languages,
now is a good time to update your curriculum.

When using the applet,
if you see an error message beginning with the word BUG,
the error was mine, not yours.
Please notify me if you see such a message,
or if you see other behavior that you're sure is wrong.
Include the expression you were trying to evaluate,
and any other pertinent information,
such as whether you were going backwards using the backspace key at the time.
I would prefer to give you an email link in this page,
but spammers love those,
and I already receive over 10000 spam messages every day,
so you'll have to hand-translate this email address
into the header of your message to me:
**abell at brising dotcom**.
Use **AUNT SALLY** as your subject line.

This applet uses IEEE-754 binary arithmetic, which does not always conform to the expectations of classical mathematicians, but painful experience has shown this behavior to be a computational necessity. The kind of arithmetic used is unrelated to the behavior of the parsing process.

This applet was developed and compiled with JDK 1.4.2, but it seems to work with newer JREs, and maybe even backward into 1.1.?. The applet and this page are Copyright (C) 2003 brising.com, All Rights Reserved. You are free to use it in the context of this website.