shared API for double-entry accounting, treating it as a 'math' library.

Thu Nov 14 10:17:33 EST 2013

A couple clarifications in response.

On Thu, Nov 14, 2013 at 6:02 AM, Bradley M. Kuhn <bkuhn at sfconservancy.org>wrote:

> Chris Travers wrote at 05:49 (EST):
>
> > 1. Complexity of input data. Financial transactions are complex, and
> > they get more complex in certain kinds of environments. In the
> > for-profit world, I would expect a grocery store receipt to add up to
> > maybe a hundred of gl line items.
>
> So, I agree that the overall picture of the financial transactions are
> complex, but think about how much of that is *not* the actual
> double-entry accounting data.  Considering your grocery store receipt
> example, if we ignore the other parts of the question, it's just (I
> apologize for the Ledger-CLI-y syntax):
>
> 2013-11-14 John Q. Customer
>      Income:Gross Sales                  $-2.99
>      Income:Sales Tax Collected          $-0.26
>      Expenses:Sales Tax                   $0.26
>      Inventory                           1 BoxedCereal {=$2.99}
>
> ...repeated for every item bought....
>
> This is a lot of data, no question, but it's not complex.  Obviously,
> I've eliminated much of the detail that might want to be kept.  That
> John Q. Customer maybe should be a customer id that (in SQL land) would
> index to a record with lots more data.  BoxedCereal might be an
> inventory code instead of a moniker like it is in Ledger-CLI.
>

This is true, but you still have the issue of reporting, and the ability to
run reports in the future based on criteria that may  not have occurred to
you yet.  I think, reading your API specification, that you try to get
around this by tagging, and that's not a bad way to go.  So I think you'd
have a sku tag for the part, and maybe a till tag, and/or maybe a
salesperson tag, etc on every line.

Again your specification handles that side and does so pretty well.  (I am
not sure how much payee information needs to go into the API?  That's a
topic for another time perhaps :-).

>
> The point is that the *just* the computation of double-entry accounting
> can be separated into something more basic.
>
> To some extent until you want to ask, "how has our gross margin on boxed
serial changed in the last five years?  I want month by month figures.  And
rank specific products by year by gross margin."   Once you want the
ability to even be able to ask these questions you have to be capturing all
the links to all the data.

I guess what I am getting at is that I am not sure how far the accounting
side can be separated from everything else.

>
> > 2.  Pervasiveness of data. Financial transactions touch everything
> > and they tend to be connected to everything. Managing that
> > connectedness is a significant challenge of a sort of loosely coupled
> > approach. An API would need to be able to handle that.
>
> I'm not convinced on that.  Plenty of places, these things would be
> identifiers that would have to index somewhere else, but the double
> entry library just doesn't care about that.
>

Right.  Again you have tags on one side, but I don't think that is quite
far enough.  You would have to also support the other side being able to
take the data stored and store links on its side.

In other words, so we have a boxed cereal purchase.  Not only do you need
to be able to store tags on the transactions but you may need the other
side of the system (the inventory tracking system) to be able to link lines
in for FIFO inventory tracking purposes.  So I think what's missing is that
the return values from the API would need to include what was submitted
plus id's for indexing on the other side.

Once you have that bilateral communication, then I think the rest of the
problems are ok.

>
> I realize that the annoying nature of having to talk through an API to
> get to double-entry data....
>

I don't know you will always be able to force folks to go through the API.
 As the volume gets even mildly large (let's say a few million lines), the
ability to run declarative queries becomes important, which is why I would
assume that larger volume users will probably put the data in an RDBMS.

>
> > 3. Integration of data. Often times it is important to be able to do
> > complex reporting Often times one needs to tie line items of a
> > transaction back to some other data. For example, you might want to
> > find out "how has our spending on scalpals we ship to clinics in
> > Africa compared to our spending on insulin?"
>
> ... and this problem is a real one.  If you "know" it's a database
> underneath, why not just to a join instead of going through the API.
> So, I can imagine that keeping people "to" the API would be the toughest
> problem.
>

I am not sure that's a problem.  I think that if you have an API rich
enough for the basic common cases (including, for example, a trial balance
and funds report), I don't know that it matters if people use the API for
everything.

The big problem that you have with API's and set-based data is that a
row-by-row analysis is slow and it is, quite frankly, a pain to process.
 If you have a database underneath you are probably going to be pulling the
data from the API into the db periodically for reporting purposes for that
reason.

>
>
> > Having said all of this, if you had an API that submitted a JSON
> > document and received one back with line identifiers, etc. populated,
> > then it should be manageable enough. There are naturally limits to
> > this kind of modularity due to the complexity of the data being stored
> > and retrieved on both sides.
>
> See, I think the issue is *volume*, not complexity.  You can do this
> complex queries, get a big-honking-JSON back, and then parse through it.
> Queries that would take seconds now take minutes.  That's the real issue
> I see with the idea.  I don't have an idea of how to solve that.

I have seen things like this be an issue, but I think that where you see
them, you are going to see them first in areas where that is not so easily
avoided.  At any rate you do raise a good point.

I am going to think about this and the times I have seen volume raise
problems.  To be fair, it is rarely just volume.  One often wants to mark
invoices as locked for payment so they can't be double-paid for example,
and when paying a few thousand invoices at a time, this can introduce
performance issues.

At any rate, from my experience I would expect the bottlenecks to be in
trying to manage state over a stateless exchange if that is supported, and
figuring out how to interact with the user if it is a lot of data presented
for review (bulk payment of invoices is a key issue there).

Speaking of bulk/batch workflows, this is one thing I didn't see in your
API so far: separation of duties/four eyes principle.  How would you expect
that would be handled?

-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more.shtml
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sfconservancy.org/pipermail/npo-accounting/attachments/20131114/0e94a094/attachment.html>