Who is spying on your data? plus; how to avoid becoming a victim of social engineering

August 6, 2018
7 mins read
Who is spying on your data? plus; how to avoid becoming a victim of social engineering

    You are pumped. Your company has given you an app to simplify your tax bookkeeping like receipt handling and expense management. The company behind the app labels itself as “better than a slice of bread”, using backoffice automation with their “smarter than an average bear scan technology”. So you start loading the app with photos of your receipts. These are expenses incurred from a recent business trip. One after the next you snap with the app. Then you spot something weird. “Processing…” message. Minutes go by and you get a sinking feeling that something isn’t right. Why is the app taking so long to process those receipts? Minutes to hours later some finally get processed while others still queued. What is going on?

    Welcome to the world of fake product marketing.

    Like fake news the expenses management world is filled to the rim with such wild claims of real-time AI OCR. Very few companies actually have machine automation.

    This post might rock your socks a bit but bear with me. The intentions here are good. We want to move the bar up by highlighting why and how existing players process unstructured documents. onwards and upwards.

    This post is going to help you;

    1. understand what is going on in the bookkeeping software space and,
    2. help keep your financial data safe by not exposing it to social engineering.

    2 type of bookkeeping backoffice automations

    Bookkeeping backoffice automation is the stuff that happens when you submit a document into a workflow. Traditionally, there are 2 ways this works. 1- using humans (graphic below in b&w) and 2- using machines (graphic below in color).

    Backoffice automation - human vs machine powered

    (1) Human powered

    This is the most common form of bookkeeping human automation.

    (a) it often requires a human bookkeeper to operate the software since that software is not built for teams to operate and (b) software like that also relies on a farm of mechanical turks that do data entry for pennies.

    As a business owner you now have a human cost, software cost & a privacy issue.

    Often, the mechanical turks are managed through Amazon Turk services or through a 3rd party like Cloud Factory in SF which manages teams of humans in Africa & Nepal to process the dirty laundry for Xero, Expensify, et al.

    We do NOT endorse this model. Its slavery and encourages people to accept pennies to do the dirty work of the richer.  Setting up a sweat shop backoffice is easy and prone to data-privacy issues as Expensify learnt last year; here, here and here. Who wants their Uber home address exposed online? or your medical records or even your financial history so often used in social engineering to steal one’s identity? No one.

    So is there a better solution? Of course. Opt for 100% machine powered bookkeeping backoffice automation.

    (2) Machine powered

    This option replaces humans with machines end-to-end. This means all processing is done in real-time 24×7. Machines do not sleep. Machines are faster than humans, and machines can be trained (using machine learning models) to perform flawlessly.

    This is a hard problem to solve, but its being solved by a handful of companies including us at Veryfi.

    The concept of automation isn’t revolutionary but the use of machine learning is. To make this work, a company has to use a blend of technologies. From mobile to GPU heavy infrastructure running proprietary machine vision and machine learning algorithms with OCR (Optical character recognition) to turn an unstructured document (image, voice, email et al) into structured data.

    Human powered backoffice is a disaster waiting to happen!

    Why is this important?

    • Data-privacy. Who is looking at your data? Do you care if someone in a foreign country is learning about your financial life? If you do care then it’s time to do a quick scan of the Privacy Policy of the company providing you service.
    • Data Leaks. We are hearing more and more about this. Recent Facebook & Cambridge Analytica scandal. Or Expensify in 2017.
      • Its 2018 and best practice standards are not hard to implement. From hashing passwords to allowing 2-factory authentication to secure your account should be a standard. At Veryfi we have instrumented multiple means of securing your account. Use this as base to check against the provider you are using.
      • 2017 Data Breaches was the WORST so far! Avoid becoming a number.
      • High-profile revelations about data breaches at Equifax Inc.Federal Trade Commission, and Uber Technologies Inc. have dominated headlines, propelling cybersecurity-related issues to the top of concerns for businesses and consumers alike.

    “ITRC reported that Commercial businesses accounted for more than 50 percent of data breach targets, and more than 157 million compromised records in 2017.” ~ BNA

    • Social Engineering. This is the common method of stealing your identity by gathering enough data about you so as to emulate you and convince a bank rep on the phone to change your banking password so the bad actor can gain access to your account.

    Avoid sharing too much personal data with unknown human eyes. You just never know who’s looking at your data or their intentions. A human powered bookkeeping backoffice is a liability waiting to happen. Avoid at all cost.

    How to identify the type of backoffice your provider uses

    Human powered

    • HUMAN INTERVENTION. If you need to speak to a human before using their software it’s your 1st giveaway. Some coin the term human augmentation software but this really means a human is using a spreadsheet to enter your personal data into their database.
    • SLOW receipt PROCESSING. Important for expense management software. Receipt processing is the act of extracting unstructured data into structured form. At the last QBConnect, I got a chance to test drive Receipt Bank expense app. I scanned my Caltrain receipt. Then waited. Waited… waited.. The lad showing me the tech said, “please come back in 90 minutes when the receipt has finished processing.”. Here’s the irony, self-driving cars process a lot of data (a lot more than any receipt company) and they work in real-time. Human labor in always the bottleneck. And Receipt Bank’s offshore “Data Extraction Team” is the bottleneck here.
    • LIMITED AVAILABILITY. 9-5 M-F availability is another sign of traditional human hours. Humans need to rest. Machine do not.
    • BORING. Nothing WOW about the app. No magic. Any app using machine AI (artificial intelligence) / ML (machine learning) should have something magical about it. In Gmail it’s the auto suggest reply based on message content.

    “Any sufficiently advanced technology is indistinguishable from magic.” ~ Arthur C. Clarke

    Machine powered

    • 0 HUMAN INTERVENTION. Self-explanatory. Machines programmed well do not need human intervention. They work and sometimes need maintenance but that’s about it.
    • FAST receipt PROCESSING. Veryfi processes receipts in ~3 seconds. That is, the act of “extracting data” from a receipt. So you can truly throw away those physical receipts and trust the app to have stored, extracted & categorized your expense. If receipt processing takes 15-30 seconds, then there is a human looking at your PII (Personally Identifiable Information).
    • AVAILABLE 24×7. Machines don’t sleep. Therefore, a backoffice powered by 100% machines is always available for real-time processing and updates.
    • WOW, IT LEARNS. Veryfi wows through it’s real-time nature of accurate data extractions even from handwritten tips or vendor logo identification. Where it gets it wrong, it learns from user mobile feedback. Learning is a field of Deep Learning (DL) and used in ML to truly automate a process.

    Veryfi Challenge

    I’d like to leave you with a small challenge. If you can find another product that performs better than Veryfi in receipt data extraction please email us on support+veryfichallenge@veryfi.com and once confirmed we will offer you a lifetime free subscription to Veryfi Ecosystem of apps. Ready? Go!

    ~ Ernest
    Veryfi Cofounder

    Acronyms explained

    • PII (Personally Identifiable Information): Personal information, described in United States legal fields as either personally identifiable information (PII), or sensitive personal information (SPI), as used in information security and privacy laws, is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. Ref Wiki
    • OCR (Optical character recognition): Optical character recognition (also optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast). Ref Wiki
    • AI (Artificial Intelligence): Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. Ref Wiki
    • ML (Machine Learning): Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to “learn” (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed. Ref Wiki
    • DL (Deep Learning): Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Ref Wiki
    • Digitization: Digitization, less commonly digitalization, is the process of converting information into a digital (i.e. computer-readable) format, in which the information is organized into bits. Ref Wiki
    • Automation: Automation is the technology by which a process or procedure is performed without human assistance.[1] Automation [2] or automatic control is the use of various control systems for operating equipment such as machinery, processes in factories, boilers and heat treating ovens, switching on telephone networks, steering and stabilization of ships, aircraft and other applications and vehicles with minimal or reduced human intervention. Ref Wiki
    • Mechanical Turk: A human doing digital labor work for an online crowdsourcing marketplace platform.

    AI vs ML vs DL

    This is a great chart which explains that AI is nothing new. ML & DL is where all the magic happens today. This is where you will see innovation, true automation and some WOW. Take note next time someone loosely throws the AI word around.

    AI vs ML vs DL