Who is spying on your data? plus; how to avoid becoming a victim of social engineering
Home / Musings / Who is spying on your data? plus; how to avoid becoming a victim of social engineering
6 min read
Author: Ernest Semerda
You are pumped. Your company has given you an app to simplify your tax bookkeeping like receipt handling and expense management. The company behind the app labels itself as “better than a slice of bread”, using backoffice automation with their “smarter than an average bear scan technology”. So you start loading the app with photos of your receipts. These are expenses incurred from a recent business trip. One after the next you snap with the app. Then you spot something weird. “Processing…” message. Minutes go by and you get a sinking feeling that something isn’t right. Why is the app taking so long to process those receipts? Minutes to hours later some finally get processed while others still queued. What is going on?
Welcome to the world of fake product marketing.
Like fake news the expenses management world is filled to the rim with such wild claims of real-time AI OCR. Very few companies actually have machine automation.
This post might rock your socks a bit but bear with me. The intentions here are good. We want to move the bar up by highlighting why and how existing players process unstructured documents. onwards and upwards.
This post is going to help you;
1. understand what is going on in the bookkeeping software space and, 2. help keep your financial data safe by not exposing it to social engineering.
2 type of bookkeeping backoffice automations
Bookkeeping backoffice automation is the stuff that happens when you submit a document into a workflow. Traditionally, there are 2 ways this works. 1- using humans (graphic below in b&w) and 2- using machines (graphic below in color).
(1) Human powered
This is the most common form of bookkeeping human automation.
(a) it often requires a human bookkeeper to operate the software since that software is not built for teams to operate and (b) software like that also relies on a farm of mechanical turks that do data entry for pennies.
As a business owner you now have a human cost, software cost & a privacy issue.
Often, the mechanical turks are managed through Amazon Turk services or through a 3rd party like Cloud Factory in SF which manages teams of humans in Africa & Nepal to process the dirty laundry for Xero, Expensify, et al.
We do NOT endorse this model. Its slavery and encourages people to accept pennies to do the dirty work of the richer. Setting up a sweat shop backoffice is easy and prone to data-privacy issues as Expensify learnt last year; here, here and here. Who wants their Uber home address exposed online? or your medical records or even your financial history so often used in social engineering to steal one’s identity? No one.
So is there a better solution? Of course. Opt for 100% machine powered bookkeeping backoffice automation.
(2) Machine powered
This option replaces humans with machines end-to-end. This means all processing is done in real-time 24×7. Machines do not sleep. Machines are faster than humans, and machines can be trained (using machine learning models) to perform flawlessly.
This is a hard problem to solve, but its being solved by a handful of companies including us at Veryfi.
The concept of automation isn’t revolutionary but the use of machine learning is. To make this work, a company has to use a blend of technologies. From mobile to GPU heavy infrastructure running proprietary machine vision and machine learning algorithms with OCR (Optical character recognition) to turn an unstructured document (image, voice, email et al) into structured data.
Human powered backoffice is a disaster waiting to happen!
Why is this important?
Data Leaks. We are hearing more and more about this. Recent Facebook & Cambridge Analytica scandal. Or Expensify in 2017.
Its 2018 and best practice standards are not hard to implement. From hashing passwords to allowing 2-factory authentication to secure your account should be a standard. At Veryfi we have instrumented multiple means of securing your account. Use this as base to check against the provider you are using.
“ITRC reported that Commercial businesses accounted for more than 50 percent of data breach targets, and more than 157 million compromised records in 2017.” ~ BNA
Social Engineering. This is the common method of stealing your identity by gathering enough data about you so as to emulate you and convince a bank rep on the phone to change your banking password so the bad actor can gain access to your account.
Avoid sharing too much personal data with unknown human eyes. You just never know who’s looking at your data or their intentions. A human powered bookkeeping backoffice is a liability waiting to happen. Avoid at all cost.
How to identify the type of backoffice your provider uses
HUMAN INTERVENTION. If you need to speak to a human before using their software it’s your 1st giveaway. Some coin the term human augmentation software but this really means a human is using a spreadsheet to enter your personal data into their database.
SLOW receipt PROCESSING. Important for expense management software. Receipt processing is the act of extracting unstructured data into structured form. At the last QBConnect, I got a chance to test drive Receipt Bank expense app. I scanned my Caltrain receipt. Then waited. Waited… waited.. The lad showing me the tech said, “please come back in 90 minutes when the receipt has finished processing.”. Here’s the irony, self-driving cars process a lot of data (a lot more than any receipt company) and they work in real-time. Human labor in always the bottleneck. And Receipt Bank’s offshore “Data Extraction Team” is the bottleneck here.
LIMITED AVAILABILITY. 9-5 M-F availability is another sign of traditional human hours. Humans need to rest. Machine do not.
BORING. Nothing WOW about the app. No magic. Any app using machine AI (artificial intelligence) / ML (machine learning) should have something magical about it. In Gmail it’s the auto suggest reply based on message content.
“Any sufficiently advanced technology is indistinguishable from magic.” ~ Arthur C. Clarke
0 HUMAN INTERVENTION. Self-explanatory. Machines programmed well do not need human intervention. They work and sometimes need maintenance but that’s about it.
FAST receipt PROCESSING. Veryfi processes receipts in ~3 seconds. That is, the act of “extracting data” from a receipt. So you can truly throw away those physical receipts and trust the app to have stored, extracted & categorized your expense. If receipt processing takes 15-30 seconds, then there is a human looking at your PII (Personally Identifiable Information).
AVAILABLE 24×7. Machines don’t sleep. Therefore, a backoffice powered by 100% machines is always available for real-time processing and updates.
WOW, IT LEARNS. Veryfi wows through it’s real-time nature of accurate data extractions even from handwritten tips or vendor logo identification. Where it gets it wrong, it learns from user mobile feedback. Learning is a field of Deep Learning (DL) and used in ML to truly automate a process.
I’d like to leave you with a small challenge. If you can find another product that performs better than Veryfi in receipt data extraction please email us on firstname.lastname@example.org and once confirmed we will offer you a lifetime free subscription to Veryfi Ecosystem of apps. Ready? Go!
~ Ernest Veryfi Cofounder
PII (Personally Identifiable Information): Personal information, described in United States legal fields as either personally identifiable information (PII), or sensitive personal information (SPI), as used in information security and privacy laws, is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. Ref Wiki
OCR (Optical character recognition): Optical character recognition (also optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast). Ref Wiki
AI (Artificial Intelligence): Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. Ref Wiki
ML (Machine Learning): Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to “learn” (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed. Ref Wiki
DL (Deep Learning): Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Ref Wiki
Digitization: Digitization, less commonly digitalization, is the process of converting information into a digital (i.e. computer-readable) format, in which the information is organized into bits. Ref Wiki
Automation: Automation is the technology by which a process or procedure is performed without human assistance. Automation  or automatic control is the use of various control systems for operating equipment such as machinery, processes in factories, boilers and heat treating ovens, switching on telephone networks, steering and stabilization of ships, aircraft and other applications and vehicles with minimal or reduced human intervention. Ref Wiki
Mechanical Turk: A human doing digital labor work for an online crowdsourcing marketplace platform.
AI vs ML vs DL
This is a great chart which explains that AI is nothing new. ML & DL is where all the magic happens today. This is where you will see innovation, true automation and some WOW. Take note next time someone loosely throws the AI word around.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.