Data transformation: How Veryfi Users Can Run Regex for Parsing Serious Amounts of Data Fast

March 11, 2021
5 mins read
Data transformation: How Veryfi Users Can Run Regex for Parsing Serious Amounts of Data Fast

    Data transformation. Data extraction. Regex patterns and target strings. Did we lose you?

    Whether or not these terms are familiar, they are essential to what we do at Veryfi and what Veryfi can do to simply your work life and give you back valuable time. And don’t worry, we will make it easy to understand why these terms and their functions are so important. We’ll start with regex, or regular expression, a character sequence used to define a search pattern.

    How to create a regex that works for you

    You can now run regex in Veryfi’s data transformation services to customize the results of real-time AI data extraction. Our platform comprises three parts that work together like a Swiss watch:

    1. Document capture is used for collecting invoices, receipts, and bills at the point of engagement. Two examples: email and camera. Veryfi’s email engine consumes POS digital receipts at the point of issue. Veryfi lens uses the mobile app camera to consume paper documents at the point of engagement.
    2. Data extraction analyzes data in these unstructured documents to create structured data in a JSON standard file format. This JSON (also known as Level3 data) can be used to automate expense management, bill pay, market research into consumer spending behavior, and bookkeeping such as construction job costing.
    3. Data transformation is where we take it up a notch and allow you to run custom conditions over the extracted data to further the value creation. This is the long tail of data extraction.

    Check out this instructional video we made to answer your questions on how to create custom fields in regex! This rest of this blog focuses on data transformation and the new regex expression builder.

    Data transformation is your new best friend

    Veryfi API Portal Menu

    Access Veryfi data transformation within the Veryfi API portal by opening up the data transformation dropdown (as shown at left) and selecting “Rules”.

    Make sure you have a Veryfi account before doing this. Sign up here with your email address to get started (it takes less than 1 minute). Create a free account to test drive the experience.

    What is regex?

    A regular expression (also referred to as regex or regexp) is a sequence of characters (often called a pattern) that specifies a search pattern. The pattern describes regular languages in formal language theory. In short, a regex pattern matches a target string.

    We know this sounds complicated, but you encounter regex often. Regexes uses are numerous, including on websites to validate email addresses on signup/login forms, check password strengths in online services, validate addresses and phone numbers on websites.

    Developers can easily craft a regex, but what about the rest of us? It’s easy to find online reference guides and cheat sheets, so this post will focus on executing regexes in Veryfi.

    You can validate the regex you write using a tool like regex101.

    Regex in Rules

    Let’s start by adding a new rule under data transformation. Go to the Rules page in the Veryfi portal.

    From the Rules page, press the blue button called “+ Add a Rule”. A modal opens to add a rule. From the “Condition” dropdown select “Document > OCR Text Contains” and add the regex under “Filter”. The regex will now run over the OCR text that Veryfi returns in the data extraction response.

    The OCR text this rule will run over is the JSON response from each document’s data extraction. When viewing the JSON for any document you extract you will find a key name “ocr_text”, the value of which will be used to execute against your regex.

    ...
    "invoice_number": "",
    "line_items": [
        { ... }
    ],
    "ocr_text": "Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.",
    "payment_display_name": "Visa",
    "payment_terms": "",
    ...

    What can I do with this?

    The sky’s the limit.

    Here are a few use cases:

    • Market research is another interesting area that aims to understand consumer spending behavior by analyzing SKU line items on receipts for consumer packaged goods (CPG) and fast-moving consumer goods (FMCG). Soap and toothpaste are good examples of CPG products. These items are sometimes called FMCGs because of how quickly they sell. Market research firms also use regexes to understand shoppers’ Loyalty IDs or the EAN128 barcode on Walmart receipts. Regexes make it easy to customize and control these long-tail opportunities.

    How this looks in practice

    Through a customer request, we see the power and convenience of regexes. The customer asked Veryfi support about how to extract a business-specific value (686125) from receipts and put it into the Notes field. We advised the customer to add the following rule with the displayed regex into the Filter field and then under Action(s) select the field to apply the data transformation to and what to do with it {match}.

    Receipt where Regex needs to be applied
    Setup Veryfi Rules using Regex

    With the regex, Veryfi’s OCR API extracts the value they wanted and puts it into the Notes field. It’s intuitive and simple.

    Next steps

    So are you ready to start using regexes to transform your work life? This new feature of data transformation is already live and waiting for you so give it a spin! If you need a free OCR API account, sign up here.

    If you create a nice recipe of regex over Veryfi’s data extraction please share it with us so we can see what you cooked up.

    Feedback is always welcome, so please let us know what we can do better by emailing support@veryfi.com.

    Over 150 fields can be extracted by Veryfi OCR API. See the whole list.