Go SDK: OCR Invoices & Receipts in Seconds

August 10, 2021
6 mins read
Go SDK: OCR Invoices & Receipts in Seconds

    When it comes to real-time data extraction from invoices and receipts of any kind, Veryfi has you covered. Just upload a photo or file through the Veryfi Go SDK, and you can extract the data in real-time with only a handful of lines of code.

    Let’s take a look at what Veryfi offers you.

    The problem

    Let’s say we are faced with a challenge where we have to capture and extract data from thousands (or even millions) of backlogged receipts. We don’t want to do it manually because copying and pasting from files to files takes a lot of time and is often error-prone. Also, different receipts have different forms and styles, finding the meaningful data across a huge pile of files can be a tedious task. What is even better is if we could extract the receipts and see the processing in real-time where the results can be returned almost immediately in seconds, not hours. So now, what is the best way that we can solve this?

    The solution

    Assume that our backlogged receipts are kept in a single folder on our computer, we want to build a program to scan through that directory and process those receipts one by one. We also want to keep that directory in sync so that we don’t process those documents that we have already processed or if any new document is added, we want to process that as well. Instead of building a platform that does the data processing/capturing from scratch, we are leveraging Veryfi API and Veryfi Go SDK to help us do the heavy-lifting data transformation task.

    What is Veryfi

    Veryfi is a leader in real-time document capture, data extraction, and data transformation of unstructured data in the form of receipts, invoices, purchase orders, checks, W2s, and other business documents into structured data at scale.

    What is Veryfi Go SDK

    Veryfi Go SDK is the Go module for communicating with the Veryfi API, making it easy for developers like you and me to have our data captured using a few lines of code.

    Let’s get started

    First, we can either clone the project using:

    git clone https://github.com/veryfi/veryfi-go.git

    or using the go get command like so:

    go get github.com/veryfi/veryfi-go

    Once installed, let’s register on the Veryfi website to get the necessary tokens. We will need the client-id, username, and api_key to initialize our API client.

    package main
    import (
       "github.com/veryfi/veryfi-go/veryfi"
    )
    
    func main() {
       client, err := veryfi.NewClientV7(&veryfi.Options{
           ClientID: "YOUR_CLIENT_ID",
           Username: "YOUR_USERNAME",
           APIKey:   "YOUR_API_KEY",
       })
       if err != nil {
           log.Fatal(err)
       }
    }

    Since we’re treating this solution more or less as a proof-of-concept, we want to simplify it as much as we can to demonstrate the core ideas. That said, instead of trying to connect a database to keep track of all the files that have been processed, we’re gonna use an in-memory key-value store or a map[string]string in Go.

    m := map[string]string{}

    Specify the directory that contains all the receipts:

    receiptsDir := "YOUR_RECEIPTS_DIRECTORY"

    List all the files/documents in the directory:

    files, err := ioutil.ReadDir(receiptsDir)
    if err != nil {
       log.Fatal(err)
    }

    If we’ve seen that document before, skip it. For every one of them, process with the client.ProcessDocumentUpload method. If the data is being captured successfully, we set the document’s status to ok. Otherwise, we record its error. Each API call returns all the data extracted in ~3–5 seconds for a single-page document. Each additional page for PDF documents may take another 1–2 seconds per page. Note that we also use the scheme.DocumentSharedOptions option to record the filename for future references.

    for _, f := range files {
           filepath := fmt.Sprintf("%s/%s", receiptsDir, f.Name())
           if _, ok := m[filepath]; ok {
               continue
           }
           resp, err := client.ProcessDocumentUpload(scheme.DocumentUploadOptions{
               FilePath: filepath,
               DocumentSharedOptions: scheme.DocumentSharedOptions{
                   FileName: f.Name(),
               },
           })
           if err != nil {
               et := err.Error()
               log.Printf("\tError: %v\n", et)
               m[filepath] = et
               continue
           }
    
           m[filepath] = "ok"
           log.Printf("\tDate: %v *** Vendor: %v *** Total: %v\n", resp.Date, resp.Vendor.Name, resp.Total)
     }

    Lastly, in order to check for new documents continuously, a simple and straightforward way is to use the outer for loop with a 10-second rest in between. Putting everything together, here is our final implementation with some additional comments and loggings:

    package main
    
    import (
       "fmt"
       "io/ioutil"
       "log"
       "time"
    
       "github.com/veryfi/veryfi-go/veryfi"
       "github.com/veryfi/veryfi-go/veryfi/scheme"
    )
    
    func main() {
       // Initialize a Veryfi Client for v7 API
       client, err := veryfi.NewClientV7(&veryfi.Options{
           ClientID: "YOUR_CLIENT_ID",
           Username: "YOUR_USERNAME",
           APIKey:   "YOUR_API_KEY",
       })
       if err != nil {
           log.Fatal(err)
       }
       // m keeps track of files that have been uploaded
       m := map[string]string{}
       
       // Specify the directory that contains the all the receipts
       receiptsDir := "YOUR_RECEIPTS_DIRECTORY"
       for {
           log.Println("Syncing...")
           files, err := ioutil.ReadDir(receiptsDir)
           if err != nil {
               log.Fatal(err)
           }
           
           for _, f := range files {
               filepath := fmt.Sprintf("%s/%s", receiptsDir, f.Name())
               if _, ok := m[filepath]; ok {
                   continue
               }
               resp, err := client.ProcessDocumentUpload(scheme.DocumentUploadOptions{
                   FilePath: filepath,
                   DocumentSharedOptions: scheme.DocumentSharedOptions{
                       FileName: f.Name(),
                   },
               })
               if err != nil {
                   et := err.Error()
                   log.Printf("\tError: %v\n", et)
                   m[filepath] = et
                   continue
               }
               m[filepath] = "ok"
               log.Printf("\tDate: %v *** Vendor: %v *** Total: %v\n", resp.Date, resp.Vendor.Name, resp.Total)
            }
            log.Println("Summary:")
            
            for k, v := range m {
                log.Printf("\tFile: %v *** Status: %v\n", k, v)
            }
            time.Sleep(10 * time.Second)
        }
    }

    Successful output will look like something below. You can see that we’re only printing out the date, vendor name, and total amount of each receipt for readability even though there are a lot more fields that Veryfi captures.

    2021/07/06 15:52:34 Syncing...
    2021/07/06 15:52:38     Date: 2021-07-06 10:31:00 *** Vendor: Rad Power Bikes *** Total: 1093.91
    2021/07/06 15:52:38 Summary:
    2021/07/06 15:52:38     File: /Users/hoanhan/Downloads/receipts/Order #481622 confirmed.pdf *** Status: ok
    2021/07/06 15:52:48 Syncing...
    2021/07/06 15:52:48 Summary:
    2021/07/06 15:52:48     File: /Users/hoanhan/Downloads/receipts/Order #481622 confirmed.pdf *** Status: ok
    2021/07/06 15:52:58 Syncing...
    2021/07/06 15:53:12     Date: 2021-07-06 21:47:00 *** Vendor: Overstock *** Total: 370.72
    2021/07/06 15:53:16     Date: 2021-07-06 10:58:00 *** Vendor: Amazon *** Total: 108.94
    2021/07/06 15:53:21     Date: 2021-07-06 21:11:00 *** Vendor: Etsy *** Total: 85.42
    2021/07/06 15:53:21 Summary:
    2021/07/06 15:53:21     File: /Users/hoanhan/Downloads/receipts/Order #481622 confirmed.pdf *** Status: ok
    2021/07/06 15:53:21     File: /Users/hoanhan/Downloads/receipts/Thank You for Your Overstock Order (#346121471)!.pdf *** Status: ok
    2021/07/06 15:53:21     File: /Users/hoanhan/Downloads/receipts/Your Amazon.com order #111-8323577-3736234.pdf *** Status: ok
    2021/07/06 15:53:21     File: /Users/hoanhan/Downloads/receipts/Your Etsy Purchase from Treeheartfurniture (2089392782).pdf *** Status: ok

    On hub.veryfi.com, you can also see all the documents that are captured.

    Watch Go SDK Video

    Here’s the video link for the tutorial if you want to see everything in action in a video format. We’re also sharing some tips on how to navigate the hub.veryfi.com to see more detailed results as well as other integrations.

    Feedback and Contributing

    Any feedback, positive or negative, keeps Veryfi growing and improving! While working with the SDK, you may encounter bugs or issues. If you do, please open an issue on the Veryfi Go SDK Github Repository.

    Feedback

    GitHub issues: if you want to leave public feedback, please open a GitHub issue in the Veryfi Go SDK Github Repository. Doing so may also help any other users experiencing the same problem and grow the conversation. We make sure to evaluate issues on our end too, so we can address them in future releases.

    Contact us: If you want to speak with our team privately to ask questions, give feedback, or make a feature request, please email us at support@veryfi.com.

    Contributing

    You can always make pull requests for new fixes or features to the Veryfi Go SDK. Please ensure your requests are made under the MIT license. Our team also reviews any requests before they’re merged, so unit tests are gladly accepted.

    Special Thanks!

    Thank you Hoanh @ Veryfi for the stellar Go write-up.