Microsoft Excel Basics Part II: Data Types in Excel

Having represented this site as a technology blog with a focus on programming and software development, I find myself starting off with a tutorial on Excel basics instead. Somehow, I thought it was going to be more glamorous than this . . . This is driven by my need to prepare a training for my staff at work, and in keeping with the DRY principle (Don’t Repeat Yourself) I am killing two birds with one stone. Anyone hoping to read programming stuff is just going to have to wait . . .

This is part two of a multi-part series targeting entry-level users of Microsoft Excel. In the previous post, we discussed the basic structure of a spreadsheet, and covered the vary basics of using expressions (or formulas). There is a LOT more to that, but before we go any further, it is important to understand how Excel understands data.

I am about to be a little long-winded. Hopefully you will slog through anyway, unless you already have a solid grasp on this stuff. Let’s take a (really) broad look at some data basics, before we explore the data type definitions specific to Excel. Because this is a long post, here are some navigation aids:

The Broad View – Information Basics

What’s in a number? (numeric information)
Truth Be Told (logical information)
String Theory (Text information)

What’s in a number?

We humans don’t tend to think too hard about the type of data we are working with. We have evolved to be context-aware, and as we learn, we understand how to use MOST information within a particular context. For example, when we talk about our Social Security number, or our credit card number, we generally recognize either of those things as an identifier composed of numerical characters. We do not expect to be performing addition, multiplication, or other mathematical operations on these, even though semantically we still call them “numbers.” Further, both SSN’s and Credit Card account numbers might contain one or more leading zeros, or zeros at the rightmost end of the numerical sequence:

Example SSN: 001-11-1111

Never mind the dashes which make this SSN format behave emphatically NOT like a true number.

While most credit card companies assign a string of numerals as an account identifier (account number), other business entities might use a mixture of numerals and alphabetical characters to create account identifiers, which we still tend to refer to as “account numbers.” Again, while we think of the various account identifiers which (for some) rule our lives as “numbers”, we also tend to utilize them in a context more like a string of characters.

Contrast this with Social Security payments and/or credit card charges. These numbers represent numerical values which we contextually understand will be added to or subtracted from the current balance of our account. In a similar manner, the credit company will most likely perform a calculation (which, apropos of nothing whatsoever, would be represented by a formula) to determine how much interest to charge to our account. This amount would be calculated based upon our current balance (and a number of other factors), and then added to that balance for the current billing period. We intrinsically understand this, even though the actual math part may not be a part of our conscious thought stream.

In these cases, our human contextual ability allows us to (most of the time) use the right “numbers” in the correct and meaningful manner. Mentally, we all might play a quick mental arithmetic on our credit card account balance, subtracting the approximate amounts of our recent charges, before we decide to purchase that new 52″ HD television. When we whip out the plastic to punch the card number into the space on the Amazon website, we type in the set of numbers, recognizing that there is no numerical operation to be performed, other than making sure we type correctly.

[Back to Top]

Truth be Told . . .

On the other hand, we humans are nowhere near as good at evaluating a set of conditions and arriving at a concise logical value. We have to practice at this, and even then, our ability to inject that very same context can create flaws in our evaluation. We confuse our singular perceptions with absolutes. True logic is not relative. For example, on a very cold day here in Portland (say, 37 degrees F – yes, that would be “cold” here in the Pacific Northwest) one man might observe to another that it is “freezing out here” when in fact the temperature is well above the point at which liquid water changes state into its solid form. However, his companion is very likely to agree with him, or failing that, state that “yup, it’s cold out.”

Neither of these statements can be evaluated from a logical standpoint outside that of the observer, because they are relative statements. If an Inuit fresh from the North Slope of Alaska were to join the group, he would likely find the observations of the other two men absurd. To the Inuit, the actual outdoor temperature of, say, 37 degrees might feel nearly tropical. In fact, since “freezing” is technically defined as the point at which water turns to ice (32 degrees Fahrenheit, or 0 degrees Celsius), the only valid logical statement which could be made in this situation is that it is NOT, in fact, freezing outside.

As humans, we are accustomed to applying our personal, regional, and societal metrics in evaluating information. Our computers, however, and by extension the software which runs on them, are machines which deal in absolutes. To the machine, a statement which is not 100% true is FALSE. Numbers may be manipulated in meaningful mathematical ways, while arrangements of text cannot (or at least, not in the same manner as true numbers).

Despite some illusions built into our modern software (including Microsoft Excel) to the contrary, the computer does not understand context. The computer is very, very literal and will perform whatever action or calculation you request of it (unless, of course, the programmer built in a mechanism to prevent the operation from being carried out if certain conditions aren’t met) with no comprehension of the sensibility of the result.

At the very, very most basic level, TRUE and FALSE (in a technical sense, ON and OFF) are the only two things the computer hardware understands. After one delves down through several layers of software abstraction, one eventually runs into hardware which is binary in nature. Binary code is composed of ones and zeros, representing ON (one) and OFF (zero). We are not going to go into how all THAT works here. But its important to understand the literal nature of the machine, that the machine lacks ANY “context” beyond that supplied by the programmer, and that all of the information we work with in our user-friendly, usability-tested software interface has been heavily abstracted for our benefit.

There is an entire branch of mathematics devoted to logical analysis called Boolean Algebra. While we won’t be going into that in this article, an understanding of basic logical operations can lend itself well to some advanced spreadsheet features that will make you the envy of your friends, and the champion around the office. More in a bit.

[Back to Top]

String Theory

I am NOT about to embark on a discussion of theoretical physics.

I am going to talk about text. Like, words. This overly long post is constructed of words, in the English language, which I have typed in to the computer using my keyboard. I am currently creating this post using Windows Live Writer, a nifty blogging program which has abstracted away the need for me to understand all that goes into marking up my blog post with suitable HTML, creating an ftp connection, and uploading the content into the proper directories on the server so that it displays properly for you, the reader. I just type, and when I am done, I use mu mouse cursor to push an on-screen representation of a “button” that says “publish”. In the same manner, our software tends to abstract from us the need to understand what is really going un “under the hood” when we ask our machines to work with text. It appears to us, as end users, that the machine recognizes the strings of characters we type and manipulate, and works with text just as easily as it works with numbers.

Which is emphatically not true.

We’re not going to go into the how and why of it right now, but let’s just say that creating effective ways for our machines to handle input, display, formatting, and printing of text (not to mention standardization across platforms!) has been one of the major challenges of the PC era. Just to give you an idea, here are a couple of entertaining and informative articles by Joel Spolsky on the topic of character strings. It is not required reading for our purposes here, but you may find it informative:

Suffice it to say, it is important to be aware that text information lives in its own realm, even within the confines of our simple spreadsheet. Even more important, it is REALLY important for you, the reader, to be able to properly distinguish whether your spreadsheet should be treating the information in a cell as text, or something else. We’ll see why in the next post.

A final comment before we get into the nitty gritty. At a basic level, the text we take for granted on screen is regarded by our machine as a string of characters. These characters are actually ASCII or Unicode representations of what we actually see displayed. Because our words, sentences, and paragraphs (including whitespace and non-printing items such as carriage returns, tabs and other markup) are all composed of such character strings, we will often refer to text data as strings. In other words, in your travels, whether in Excel or some other application, mention is made of “String Parameter”, “String Data Type”, etc. you can generally substitute “text” for “string” and be just fine.

Ok. Now that we have the preliminaries out of the way, let’s look at how Microsoft Excel regards our data.

[Back to Top]

Data Types

Microsoft Excel recognizes four kinds of information: Logical values (TRUE or FALSE, also called Boolean values), Numerical values, Text values, and Error types. The four kinds of information are known, in technical parlance, as Data Types. What is important for you, the user, is to begin to think of your own information in this manner, and consider how the machine is going to use it.

Note that Excel will do the very best it can to figure out which of these types it THINKS you intend, once you complete typing into a cell and hit the enter key. For example, if you type a series of numerals, it assumes you intend a number type. If that series of numerals happens to begin with one or more zeros, Excel STILL thinks you intend to type a number, and eliminates those leading zeros. Likewise, if you type something the looks like a date, and contains valid numbers which can represent a valid date, Excel assumes you mean a number type again, formatted as a date.

You will learn more about this date stuff later in the post. However, be aware that Excel tries to help you in this manner. Most of the time, it works out. Sometimes, it doesn’t, and then you have to help Excel by telling it what you REALLY mean, explicitly.

The Logical Data Type

Logical values are either TRUE or FALSE.

In most cases, logical values will be present as the result of the evaluation of an expression or function. Essentially, a logical value represents the resolution of an expression indicating whether certain conditions have been met. For example:

The statement “1 is less than 2” is recognizable as a true statement. Another way to put that is:

1 < 2 = TRUE

When evaluating logical expressions, Excel recognizes the text TRUE as a logical (or boolean) value. Excel also treats the value 0 as false, and any other numerical value as true. For example, we can make a logical comparison about the values in two different cells by typing an expression:

Compare the values of two cells:	The expression returns TRUE:

If we add change one of the values, the expression will no longer return true:

Change one of the values:	The expressions returns FALSE:

Logical expressions are one of the more powerful tools Excel has to offer, and yet they are also some of the least utilized. We will devote an entire post to examining logical expressions in the near future.

[Back to Top]

The Number Data Type

Numerical values are, of course, numbers. This is more complex than it first appears, however. First of all, Excel stores all numbers as Double-Precision Floating Point values. I know. It’s a mouthful. For our purposes here, you can think of that as a decimal number, with room for a lot of decimal places if needed. Excel can store numbers as large as 1.79768 X 10³⁰⁸ and as small as 2.2250 X 10^-308. These represent extremely large, and extremely small numbers, respectively.

The thing to remember here is that to Excel, all of the following are numbers: 15,000; 100; $50; $50.00; 75%; 0.5; 5.35E+04; and 12/25/2012.

What, you say? What was that last? Yes. To Excel, dates are also stored as plain old numbers. We’ll discuss THAT in a minute. For the moment, take a look at what you might type in or see displayed in a cell, vs. what excel is actually storing:

Visible in Cell	15,000	100	$50,000	$50.00	75%	.5	5.35E+04	12/25/2012
Number Stored	15000	100	50000	50	0.75	0.5	53500	41268

While you are likely familiar with most of the numbers you see in the table above, some of you may not be familiar with the second from the right, known as Scientific E notation. Don’t worry about that right now. Scientific E notation is a a sort of shorthand for expressing very large (or very small) numbers The example above is trivial for the purpose of illustration. Trust me that the expression in the top cell is numerically equivalent to the number displayed in the cell below.

The important thing to remember is that numbers in Excel can be no more than 15 significant digits in length. This excludes zeros on either side of the number:

99999999999999900000 is 20 numerals, but only contains 15 significant digits; the fifteen “9’s” to the left of the five zeros. Likewise, .00000999999999999999 is also twenty decimal places, but contains only fifteen significant digits).

Numbers which are entered that contain more than 15 significant digits will be truncated. That is, significant digits will be lopped off the right-hand side and replaced with zeros. Note in the examples below, we can type in 19 significant digits (fourteen 9’s followed by 87654). Once we hit “Enter”, the significant digits in excess of 15 are truncated, leaving 0’s):

19 Significant Digits Before Enter:	15 Significant Digits After Enter (truncation!)

Dates and Times

Ok. Now on to that date thing. Dates and times are also stored as number types by Excel. Dates are stored as the number of days since the date 1/1/1900. In other words, January 1, 1900 is considered by Excel to be 1. Therefore, 1/2/1900 would be stored as 2, 1/3/1900 as 3, and so on. Note that Excel does not recognize dates BEFORE 1/1/1900. We’ll see why this matters in the next post about functions and expressions.

Excel treats Times as fractions of days. Since a day is 24 hours in length, then 1/4th of a day (0.25) would be 6 hours. Since each day begins at 0 hours and 24 hours later, Excel would store the date and time for 6:00 AM on 1/1/1900 as 1.25. As I type this sentence, it is 10:39 PM on August 9th, 2011. Therefore, Excel would store the current date and time as 40764.94384. That is, forty thousand three-hundred eighty-four and (roughly) 94/100ths days since 1/1/1900.

Values vs. Formats

The key thing about numbers in Excel is that you need to separate in your mind the number VALUE from the way it is DISPLAYED. The manner in which Excel displays a given numerical value within a cell is known as formatting. We will discuss different aspects of number formatting later, but for the moment, keep in mind that underneath whatever window dressing Excel provides in terms of formatting, the following are all representations of the same value (1.05):

1.05 = 1.050000 = $1.05 = 105% = 1/1/1900

Value in Cell	Format	Displayed in Cell
1.05	General	1.05
1.05	Number (6 digit decimal precision)	1.050000
1.05	Currency	$1.05
1.05	Percent	105%
1.05	Date	1/1/1900

Likewise, these ALSO represent the same numerical value (.75):

.75 = $0.75 = 75% = 3/4 = 6:00:00 PM

Value in Cell	Format	Displayed in Cell
0.75	General	1.05
0.75	Currency (2 digit decimal precision)	$0.75
0.75	Percent (1 digit decimal precision)	75.0%
0.75	Fraction	3/4
0.75	Time (hh:mm PM)	6:00:00 PM

We’ll cover the ins and outs of the various formatting choices in the next post. But be aware, that formatting can be tricky. Notice how, in the first example above, if we choose to format the value 1.05 as a Date, we get the displayed value of 1/1/1900. In a way, this is to be expected, since we learned that in Excel, the number 1 can be considered to represent 1/1/1900. But what about that .05 (five one-hundredths)? Well, this would actually represent 1/20th of 1 day (5/100ths reduces to 1/20th). But since we specified a date format which did not include the time as part of the format to display, we get only the date.

The value of a number is stored in the cell. Formatting determines how the number is displayed, and what level of precision is displayed. When using the value of the cell in calculations, the true value is used, not the displayed value!

We will see how this might matter in a later post. For now, we will move on the the Text data type.

[Back to Top]

The Text Data Type

As we discussed previously, Microsoft Excel regards test as strings of characters. The letters of the alphabet, numerical characters, symbols such as % and $, as well as spaces and tabs are all valid text. In cases where Excel cannot distinguish a value as either a number type, a logical type, or an error type, the value will be treated as text.

Excel will recognize a text string of up to 32,768 characters. However, only 1024 can be displayed in a cell.

Excel tries to be helpful, in that, as you enter information into a cell, the application attempts to determine what type of data you are entering and treat it accordingly. Most of the time this is fine, and helpful. However, sometimes it can trip us up. For example, we will return to the issue of leading zeros and account numbers. Lets say an account number begins with four zeros:00001234567891011. If we were to simply enter this string of numbers into a cell, excel will decide we are entering a number value, and will drop those leading zeros.

Leading zeros before enter:	No more leading zeros!

We can tell Excel to treat the data in a cell as text by pre-pending a single-quote character before the text we wish to enter, or by applying the text format to the cell through the cell formatting menu (which we will discuss in the next post).

Excel offers a host of text manipulation functions which we will examine in an upcoming post.

The Error Data Type

There are instances in which errors will occur when Excel evaluates the contents of a cell. For example, division by zero is mathematically undefined, and the machine cannot, by itself, resolve this error. It turns out the Excel has an Error type specifically for this instance, the #DIV/0! result.

Leading zeros before enter:	No more leading zeros!

The error type generally rears its ugly head due to problems with functions or formulas. The different error values Excel provides are listed in the following table, along with the meaning of the error:

Error Value	Means	Common Causes
#DIV/0	Division by zero	You attempted to divide by a value of zero. A blank cell is treated as zero in mathematical operations.
#N/A	No value Available	Manually entered (and sometimes when data is imported) to indicate information not available
#NAME?	Excel does not recognize the name of a list or range of cells	The #NAME? error will result if you neglect to enclose text in quotes within a formula, or if you refer incorrectly to the name (address) of a cell or range of cells.
#NULL!	Reference to a non-existent intersection between two cell ranges	If you neglect to separate to cell ranges with a comma in certain function arguments, the #NULL! Error will result. Also if you refer to an intersection between two cell ranges which do not intersect.
#NUM!	There is a problem with a number in a formula or result	Passing an invalid argument to a function or formula, or a formula returns a number which is too large or too small to be represented in the cell.
#REF!	Invalid Cell Reference	You have deleted or pasted over a cell or cells referred to in a formula.
#VALUE!	Invalid argument or operator in a function or formula	Usually results from performing a mathematical operation with cells that contain text.

The short version of all this is that when you see that ugly hash symbol, you know you have a problem, usually in a formula. This reference will help you know where to begin looking for the problem.

OK. We’ve covered a lot of ground. In the next post we will take a closer look at formulas and some of the built-in functions Excel provides.

[Back to Top]

Summary

Understanding how Microsoft Excel interprets and handles different types of information is critical in constructing effective spreadsheets. Failure to properly anticipate how Excel will work with your data can potentially lead to calculation errors which are difficult to detect. In order to make the most effective use of expressions and formulas, and functions within Excel, an awareness of the various data types within the program will save time, frustration, and headaches.

It is important to consider that the computer does not have a contextual understanding of our information, beyond what the programmers built in.
Excel will attempt to determine the proper data type based on what you type into a cell. Sometimes you have to correct for this.
Excel recognized four types of data;