@c -*-texinfo-*-
@node Journal File Format,Extending with Python, Format Strings,  Top
@chapter LEDGER Journal File Format

This chapter offers a complete description of the journal data format,
suitable for implementors in other languages to follow.  For users,
the chapter on keeping a journal is less extensive, but more typical
of common usage (@pxref{Keeping a Journal}).

Data is collected in the form of @dfn{transactions} which occur in one
or more @dfn{journal files}.  Each transaction, in turn, is made up of
one or more @dfn{postings}, which describe how @dfn{amounts} flow from
one @dfn{account} to another.  Here is an example of the simplest of
journal files:

@example
2010/05/31 Just an example
    Expenses:Some:Account                $100.00
    Income:Another:Account
@end example

In this example, there is a transaction date, a payee, or description
of the transaction, and two postings.  The postings show movement of
one hundred dollars from an account within the Income hierarchy, to
the specified expense account.  The name and meaning of these accounts
in arbitrary, with no preferences implied, although you will find it
useful to follow standard accounting practice (@pxref{Principles of
Accounting}).

Since an amount is missing from the second posting, it is assumed to
be the inverse of the first.  This guarantee the cardinal rule of
double-entry accounting: the sum of every transaction must balance to
zero, or it is in error.  Whenever Ledger encounters a @dfn{null
posting} in a transaction, it uses it to balance the remainder.

It is also typical---though not enforced---to think of the first
posting as the destination, and the final as the source.  Thus, the
amount of the first posting is typically positive.  Consider:

@example
2010/05/31 An income transaction
    Assets:Checking       $1,000.00
    Income:Salary

2010/05/31 An expense transaction
    Expenses:Dining         $100.00
    Assets:Checking
@end example

@emph{Note:} It is important to note that there must be at least two spaces between
the end of the post and the beginning of the amount (including and
commdity designator).

@section Specifying amounts

The heart of a journal is the amounts it records, and this fact is
reflected in the diversity of amount expressions allowed.  All of them
are covered here, though it must be said that sometimes, there are
multiple ways to achieve a desired result.

@subsection Integer amounts

In the simplest form, bare decimal numbers are accepted:

@example
2010/05/31 An income transaction
    Assets:Checking        1000.00
    Income:Salary
@end example

Such amounts may only use an optional period for a decimal point.
These are referred to as @dfn{integer amounts} or @dfn{uncommoditized
amounts}.  In most ways they are similar to @dfn{commoditized
amounts}, but for one signficant difference: They always display in
reports with @dfn{full precision}.  More on this in a moment.  For
now, a word must be said about how Ledger stores numbers.

Every number parsed by Ledger is stored internally as an
infinite-precision rational value.  Floating-point math is never used,
as it cannot be trusted to maintain precision of values.  So, in the
case of @samp{1000.00} above, the internal value is @samp{100000/100}.

While rational numbers are great at not losing precision, the question
arises: How should they be displayed?  A number like @samp{100000/100}
is no problem, since it represents a clean decimal fraction.  But what
about when the number @samp{1/1} is divided by three?  How should one
print @samp{1/3}, an infinitely repeating decimal?

Ledger gets around this problem by rendering rationals into decimal at
the last possible moment, and only for display.  As such, some
rounding must, at times, occur.  If this rounding would affect the
calculation of a running total, special accommodation postings are
generated to make you aware it has happened.  In practice, it happens
rarely, but even then it does not reflect adjustment of the
@emph{internal amount}, only the displayed amount.

What has still not been answered is how Ledger rounds values.  Should
@samp{1/3} be printed as @samp{0.33} or @samp{0.33333}?  For
commoditized amounts, the number of decimal places is decided by
observing how each commodity is used; but in the case of integer
amounts, an arbitrary factor must be chosen.  Initially, this factor
is six.  Thus, @samp{1/3} is printed back as @samp{0.333333}.
Further, this rounding factor becomes associated with each particular
value, and is carried through mathematical operations.  For example,
if that particular number were multiplied by itself, the decimal
precision of the result would be twelve.  Addition and subtraction do
not affect precision.

Since each integer amount retains its own display precision, this is
called @dfn{full precision}, as opposed to commoditized amounts, which
always look to their commodity to know what precision they should
round to, and so use @dfn{commodity precision}.

@subsection Commoditized amounts

A @dfn{commoditized amount} is an integer amount which has an
associated commodity.  This commodity can appear before or after the
amount, and may or may not be separated from it by a space.  Most
characters are allowed in a commodity name, except for the following:

@itemize
@item Any kind of whitespace
@item Numerical digits
@item Punctuation: @samp{.,;:?!}
@item Mathematical and logical operators: @samp{-+*/^&|=}
@item Bracketing characters: @samp{<>[]()}@{@}
@item The at symbol: @samp{@@}
@end itemize

And yet, any of these may appear in a commodity name if it is
surrounded by double quotes, for example:

@example
100 "EUN+133"
@end example

If a @dfn{quoted commodity} is found, it is displayed in quotes as
well, to avoid any confusion as to which part is the amount, and which
part is the commodity.

Another feature of commoditized amounts is that they are reported back
in the same form as parsed.  If you specify dollar amounts using
@samp{$100}, they will print the same; likewise with @samp{100 $} or
@samp{$100.000}.  You may even use decimal commas, such as
@samp{$100,00}, or thousand-marks, as in @samp{$10,000.00}.

These display characteristics become associated with the commodity,
with the result being that all amounts of the same commodity are
reported consistently.  Where this is most noticeable is the
@dfn{display precision}, which is determined by the most precise value
seen for a given commodity.  In most cases.

Ledger makes a distinction by @dfn{observed amounts} and unobserved
amounts.  An observed amount is critiqued by Ledger to determine how
amounts using that commodity should be displayed; unobserved amounts
are significant in their value only---no matter how they are
specified, it does not change how other amounts in that commodity will
be displayed.

An example of this is found in cost expressions, covered next.

@section Posting costs

You have seen how to specify either a commoditized or an integer
amount for a posting.  But what if the amount you paid for something
was in one commodity, and the amount received was another?  There are
two main ways to express this:

@example
2010/05/31 Farmer's Market
    Assets:My Larder           100 apples
    Assets:Checking                $20.00
@end example

In this example, you have paid twenty dollars for one hundred apples.
The cost to you is twenty cents per apple, and Ledger calculates this
implied cost for you.  You can also make the cost explicit using a
@dfn{cost amount}:

@example
2010/05/31 Farmer's Market
    Assets:My Larder           100 apples @@ $0.200000
    Assets:Checking
@end example

Here the @dfn{per-unit cost} is given explicitly in the form of a cost
amount; and since cost amount are @emph{unobserved}, the use of six
decimal places has no effect on how dollar amounts are displayed in
the final report.  You can also specify the @dfn{total cost}:

@example
2010/05/31 Farmer's Market
    Assets:My Larder           100 apples @@@@ $20
    Assets:Checking
@end example

These three forms have identical meaning.  In most cases the first is
preferred, but the second two are necessary when more than two
postings are involved:

@example
2010/05/31 Farmer's Market
    Assets:My Larder           100 apples        @@ $0.200000
    Assets:My Larder           100 pineapples    @@ $0.33
    Assets:My Larder           100 "crab apples" @@ $0.04
    Assets:Checking
@end example

Here the implied cost is @samp{$57.00}, which is entered into the null
posting automatically so that the transaction balances.

@subsection Primary commodities

In every transaction involving more than one commodity, there is
always one which is the @dfn{primary commodity}.  This commodity
should be thought of as the exchange commodity, or the commodity used
to buy and sells units of the other commodity.  In the fruit examples
above, dollars are the primary commodity.  This is decided by Ledger
on the placement of the commodity in the transaction:

@example
2010/05/31 Sample Transaction
    Expenses               100 secondary
    Assets                  50 primary

2010/05/31 Sample Transaction
    Expenses               100 secondary @@ 0.5 primary
    Assets

2010/05/31 Sample Transaction
    Expenses               100 secondary @@@@ 50 primary
    Assets
@end example

The only case where knowledge of primary versus secondary comes into
play is in reports that use the @option{-V} or @option{-B} options.
With these, only primary commodities are shown.

If a transaction uses only one commodity, this commodity is also
considered a primary.  In fact, when Ledger goes about ensures that
all transactions balance to zero, it only ever asks this of primary
commodities.