Now Veryfi Users Can Run Regex for Parsing Serious Amounts of Data Fast
Author: Ernest Semerda
Data transformation. Data extraction. Regex patterns and target strings. Did we lose you?
Whether or not these terms are familiar, they are essential to what we do at Veryfi and what Veryfi can do to simply your work life and give you back valuable time. And don’t worry, we will make it easy to understand why these terms and their functions are so important. We’ll start with regex, or regular expression, a character sequence used to define a search pattern.
You can now run regex in Veryfi’s data transformation services to customize the results of real-time AI data extraction. Our platform comprises three parts that work together like a Swiss watch:
This post focuses on data transformation and the new regex expression builder.
Access Veryfi data transformation within the Veryfi API portal by opening up the data transformation dropdown (as shown at left) and selecting “Rules”.
Make sure you have a Veryfi account before doing this. Sign up here with your email address to get started (it takes less than 1 minute). Create a free account to test drive the experience.
A regular expression (also referred to as regex or regexp) is a sequence of characters (often called a pattern) that specifies a search pattern. The pattern describes regular languages in formal language theory. In short, a regex pattern matches a target string.
We know this sounds complicated, but you encounter regex often. Regexes uses are numerous, including on websites to validate email addresses on signup/login forms, check password strengths in online services, validate addresses and phone numbers on websites.
Developers can easily craft a regex, but what about the rest of us? It’s easy to find online reference guides and cheat sheets, so this post will focus on executing regexes in Veryfi.
You can validate the regex you write using a tool like regex101.
Let’s start by adding a new rule under data transformation. Go to the Rules page in the Veryfi portal.
From the Rules page, press the blue button called “+ Add a Rule”. A modal opens to add a rule. From the “Condition” dropdown select “Document > OCR Text Contains” and add the regex under “Filter”. The regex will now run over the OCR text that Veryfi returns in the data extraction response.
The OCR text this rule will run over is the JSON response from each document’s data extraction. When viewing the JSON for any document you extract you will find a key name “ocr_text”, the value of which will be used to execute against your regex.
... "invoice_number": "", "line_items": [ { ... } ], "ocr_text": "Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.", "payment_display_name": "Visa", "payment_terms": "", ...
The sky’s the limit.
Through a customer request, we see the power and convenience of regexes. The customer asked Veryfi support about how to extract a business-specific value (686125) from receipts and put it into the Notes field. We advised the customer to add the following rule with the displayed regex into the Filter field and then under Action(s) select the field to apply the data transformation to and what to do with it {match}.
With the regex, Veryfi’s OCR API extracts the value they wanted and puts it into the Notes field. It’s intuitive and simple.
So are you ready to start using regexes to transform your work life? This new feature of data transformation is already live and waiting for you so give it a spin! If you need a free account then sign up here.
If you create a nice recipe of regex over Veryfi’s data extraction please share it with us so we can see what you cooked up.
Feedback is always welcome, so please let us know what we can do better by emailing support@veryfi.com.