Automating the cumbersome Tax Reconciliation process at large scale

By Sai Prasad Krishnamurthy — In Accounting Basics, Accounting Process, BigData, Process Automation, Tax Automation, Tax Strategy, Tax Technology Startup, Tech Startup — May 19, 2022



We at TaxReco are mainly focused on automating the cumbersome Tax Reconciliation process at large scale through a simple, unified and standard workflows and improving the productivity of our customers.

Tax Reconciliation is a complex technical problem to solve especially at large scale.

Much of the complexity lies in the following areas at a high level.

  1. Volume – The number of financial transactions (or the line items) on our customers’ ledgers could vary from few tens of thousands to tens of millions. Our technology platform should adapt to such a scale.
  2. Variety – Typically Reconciliation (TDS Reconciliation as an example) is a process of matching the financial transactions from one data set to another subject to various checks and balances. For TDS Reconciliation, you’d typically have 26AS (the statement available in the Government Portal) in a standard text format and the sales ledgers (which may be customer specific formats). Our technology should be flexible enough to support multiple types of sales ledgers and the customers should be able to describe the schema on the fly.
  3. Velocity – As a SaaS solution, the platform should be able to scale to the data thrown at it at different velocities. For example: “Customer A” may be ingesting the data in the platform at the rate of 100 transactions per second which could be different to “Customer B” whose ingest rate would be 5 transactions per second. Expected performance for a customer is mainly correlated with the velocity in which the data is expected to be processed and the corresponding technical resource requirements.
  4. Polymorphic Data Representations – The entire tax reconciliation platform has many components that are built for specific needs. These components require the data to be stored in an optimised format for it to serve its purpose. For example: The input data stored for reconciliation is more optimised to be stored as a “table” in an RDBMS (as the customer’s input data are primarily CSV’s and Excel sheets) whereas the output data is more optimised to be stored in a NOSQL store for horizontal scaling and more analytics type of queries (eg: Aggregations etc).

Putting all the above challenges in perspective and to stay more future proofed, our platform is modelled as a “big data platform”. In simple terms, if the platform is able to scale horizontally as the load increases and perform better, then it’s fit to be a platform to handle big data. TaxReco 2.0 platform is precisely that.

Conceptually, the pipeline of various components of the platform looks like this.

Blog TaxReco 26AS Reconciliation
Every block represents a logical component in the platform that can scale horizontally. Technology such as Kafka becomes very handy to stitch all the components and plumb them using a pipeline that can scale well.

We also take advantage of the partitioning strategy of Kafka in a way that we process the transactions belonging to one customer on one node. Note that 26AS and sales ledgers may contain the transactions of many customers that an entity has done business with on a financial year. 
With such a partitioning strategy, it tremendously improves the performance by processing customer transactions in parallel and to make any stateful transformations wherever possible.

By choosing the right database for the purpose, we can optimise the read/write performances and make the process of extracting information from the data without much of a fuss.

Finally, the components that process the data in the pipeline are modelled as one or more microservices in our platform which communicate through Kafka using an appropriate partitioning strategy. These microservices are light-weight components that are developed using Micronaut (a lightweight microservice toolkit) which are not memory hungry and have extremely fast cold-start times. We have farmed these services on Kubernetes so that we can bring in the elasticity in the compute on demand.

Overall, our learning has been that the reconciliation platform when modelled as a “big data suitable” platform, enables us to adapt to increasing workloads faster and absorb more requirements efficiently as the components are highly decoupled and communicate via a scalable event bus (Kafka in our case). This decoupling also helps us to bring in the right technology to the right components without making it disruptive. In simple terms, if every part of a platform can scale horizontally as the load increases and perform better, then the most complex parts of the problem is solved. 

Leave a reply