How has Voice Changed as a Data Collection Technology?


I still remember, it is so fresh in my mind, I was a developer that was part of the professional services team supporting the Warehouse Management iSeries deployment for our customers and I got this very interesting design spec that said Voice Picking. I got excited and I met with the Consultant/Business Analyst that was designing this system, I was educated on this new process, where in the users would talk to the computer, I asked him why the heck would a user want to talk to the computer? His response was because functions executed using Voice are much more efficient and the reasons are as listed below,

Cons with the RF handheld devices

  1. Users will have to constantly keep the device down, finish their pick, it could be a unit pick or a box pick and then pick the device back up and continue with scanning – this slows them down
  2. Users while doing that, also have to look for a safe place, to keep these guns so they don’t fall down and break, so there is even more time wasted looking for that safe place
  3. Users also will have to press the right keys on the keyboard, so it is difficult to press the right keys within such a small space, because obviously there is no room to accommodate a much larger keyboard as part of the device that goes with a typical laptop or a desk top.
  4. Often these handheld scan guns are lost, because the operators keep them somewhere and forget them and also sometimes these handheld guns get shipped in boxes destined to customers and/or stores.

So I asked him, Have you seen those wrist mounted devices? Why can’t they use those wrist-mounted devices? His reply was, of course the wrist mounted devices are better than the RF handhelds, because they offer hands free capability, but still had disadvantages and he listed them one by one.

Cons with the Wrist Mounted Gladiator devices

  1. The numbers are pain to press because the keys are small, the key board is small and hence the users will have to multi tap the keys to input a certain number or an alpha character, he continued, that he had seen so many times, how much the users struggle just to log in, sometimes believe it or not, it takes 30 minutes just to log in and it could take longer if the device reboots in the middle of the log in process,
  2. Not everybody liked using the wearable scanners because it was heavy
  3. The people from the 2nd shift did not like using them because it was full of sweat from the use by people from the first shift and it smelled really bad
  4. It took a long time to get pickers trained because they need to learn the multi tapping keyboard and also learn the different function keys and/or the control keys they need to press to execute different functions

In many distribution centers that switched from the RF handhelds to the Wrist Mounted devices with ring scanners, the pickers were not happy, they didn’t like the device strapped to their arm for the same reasons,

  1. First the device was heavy,
  2. Second it often had someone else’s sweat on it and
  3. Third they were difficult to operate due to multi tapping keys that needs to be pressed multiple times to input data

So this new voice solution vendor has promised multiple benefits

Voice Benefits

  1. It is hands free and eyes free, that means the picker doesn’t have to carry around a device constantly keeping it down and picking it back up, so both hands are free, then also doesn’t have to look at the screen, which means eyes free, his eyes are free to focus only on the picking process, hence a lot more faster picking process, also typically picking is the most time-consuming process and hence creates a bottle neck. Any warehouse manager that is interested in increasing the throughput (throughput is the amount of product that is received through the inbound doors and shipped through the outbound doors) of the distribution center, would focus on picking first because that is the most time-consuming process, the faster the picking process, the more the throughput of the facility is, hence more orders can be shipped, hence more orders can be invoiced and hence more revenue for the company. Also imagine a billion-dollar retailer/distributor typically picks about 100 million units, so even If one can save half a second per pick, that results in 50 million seconds saved, which is equivalent to 13,889 hours, which is @ rate of $14 per hour, translates to $194,446 is savings per year, which is a big number.
  2. Also with voice, it keeps the operators going constantly, they don’t have the time to chit chat with their fellow pickers, hence that time is saved also. Also the operators feel like they are in a sci-fi movie talking to the computer exuding a high tech and cool disposition, which helps with user adoption.

For these two benefits, voice is super-efficient. So this new Voice solution vendor had made a deal with the food distributor that was already our client and this voice solution vendor was paying for the project because it was sort of a pilot for the food distributor. I had just implemented a paper based picking process for the food distributor, the paper pick sheets were called Work Assignments, so each picker would pick a sheet or Work assignment, then go the different locations by looking at the Work Assignment sheet, then pick the product, and also write down the catch weight for items like Cheese where the weight was not always the same and then hand the sheets and the product to a person that would verify the work assignment. The problem with this process was the pickers were picking very fast, along with the speed came issues with accuracy, often there were mis-picks and also items that were reported as shorts, even though the product was available in reserve, because the system did not force them to go back and check one last time, that is the difference with picking though the mobile device or voice system, every pick needs to be reported to the WMS (Warehouse Management System) right away and validated, so the errors were caught right then and there, so there were no mis-pick and shortage issues. So, no wonder they agreed, because voice gave them best of both worlds, which is speed and accuracy.

Speed Vs Accuracy with Different Data Collection Technologies


This was all great, it felt like Voice was the cure-all for all the productivity pains in the warehouse. I still remember I did a presentation on Voice Technology at our user conference Momentum in 2007 and there was a VP who jointly presented with me, who said their CEO wanted to make everything voice because the ROI was so much better.

After hearing all the great news about voice, when many DC managers actually looked at the implementation aspect of the voice solution, it was pretty expensive for the reasons below

  1. It needed an extra integration software to be developed to build interfaces to the WMS (Warehouse Management System), which was mostly custom
  2. It also needed extra servers in between in order to communicate with those voice terminal
  3. The typical implementation timeline was anywhere from 3 to 6 months and
  4. Of course all the time & effort that needs to be spent on making this project happen
  5. On top of all this, the Voice terminals were very expensive too (they are still very expensive)

Ease Vs Cost to Implement for Different Data Collection Technologies


So overall voice was very expensive and also it took a while to get this implemented.

After finding this out, I got a chance to talk to some of the DC managers that had already implemented voice, and wondered how are they faring? Here is the feedback


  1. Voice is great, the productivity improved, even though they were promised up to 60%, the average productivity improvement was somewhere around 20%
  2. It supported multiple languages
  3. It is a real time picking process, meaning the picks were immediately reported to the WMS (Warehouse Management System), validated by the WMS (Warehouse Management System), so highly accurate


  1. The voice technology that was implemented in the 2000s followed a voice template based technology, what that meant was users will have to record their templates every time when they started using Voice for the first time. The way the voice picking tool worked was it compared the file generated from the voice command that was uttered by the user against the template that was recorded. Hence it wouldn’t work effectively in the following scenarios
  2. When the user had a slight cold in this scenario the user has to re-record the voice templates, so the system will recognize the voice with cold
  3. When the user got frustrated and hence his tone changed, in this scenario the user has to re-record the voice templates, so the system will recognize the frustrated voice.
  4. When there was back ground noise in the Distribution Center
  5. Some voice systems required the warehouse locations be added a check digit, because the voice picking systems did not have the scanning capability, so the users will have to read the check digit, so the system can validate if the picker is at the correct location. This is an additional task that needs to be completed, which means expenses on the labor to complete this task and the labels that need to be mounted on each location


User Friendly Voice Technology leveraging Consumer Devices

The consumer mobile devices have changed the game big time for the enterprises. The purpose-built devices don’t really have a strong reason to be bought any longer, except the powerful scanning technology. The consumer devices have the following advantages over the purpose build voice technology,

  1. Technology that we all are familiar with so it is so easy to use hence less training time
  2. The overall trend is consumers are getting acclimated with voice technology, especially when the devices like Alexa are invading our homes, voice looks like the way to interact with machines in the future.
  3. The consumer devices also offer modern capabilities such as fast charging, that make them much more light weight, hence can offer much better user experience
  4. The consumer devices also have compatible consumer style headsets with micro phones that are not only low cast, but also user friendly and some of them are made exclusively for sports, which will be a perfect fit for warehouse users. Also they cost less, so it is not expensive to replace them often.

Much more powerful Speech Recognition Engines

  1. The latest Android and Apple devices have in built voice engines that are very sophisticated with built in artificial intelligence and machine learning capabilities and on top of it they are free. These voice engines can be easily invoked as there are APIs provided to be used within any app.
  2. Google Voice Engine is regarded as the best now, where Google has managed to beat even Apple, the best mobile device maker. This is due to the fact that Google has converted to transcription using Long Short-term Memory Recurrent Neural Networks. Neural networks in google voice.
  3. The Consumer devices are getting much higher computing power, they can easily process the voice commands quickly and accurately
  4. They are native voice recognition engines that don’t need any recording of voice templates, which means no more time spent on recording voice templates. The voice recognition will work even when the user has cold or when he/she is frustrated.

Specific Modes to not speak in picking areas with a lot of back ground noise

With many WMS(Warehouse Management System)s becoming highly configurable and flexible, they can be easily configured to work with Voice solutions and don’t require any speaking. This eliminates the need to speak to the voice system, instead the inputs are provided through the scanner, by scanning relevant barcodes, which works perfectly in a noisy environment. This also eliminates the need for check digits, because the scanners are used to scan barcodes, it is faster to scan barcodes rather than speaking check digits through the Voice system. Also some WMS systems have built in voice interfaces, so there is no need to build this interface from scratch.

Plug & Play Implementation of Voice Solutions

Voice solutions are becoming plug and play, because with consumer devices running some highly sophisticated Terminal emulations that are so easy to use with soft overlay keyboards and communication capabilities, it takes only few minutes to connect the devices to the WIFI network and also configure the voice solutions leveraging the latest sophisticated voice engines that convert the text into speech and are ready to go. These voice solutions work through the terminal emulation software, so there is no extra software that is needed in the middle to connect the consumer devices running voice to the host computer that is hosting the WMS(Warehouse Management System).

Platform Approach – A single device can be used for Scanning and Voice

Also now the systems have gotten so mature, a single device can be used for scanning and voice purposes. A user can use the device to perform a function using scanning in a different department and then can go to the picking department, wear his/her voice headset and start picking using voice, using the same device.

Cost of the Voice Solution has come down significantly

Overall the cost of the voice solution has come down significantly for the reasons below

  1. There is no more need to buy the expensive Voice specific hardware, which could cost anywhere from $4000 to $5500
  2. There is no more need to buy the integration software to voice enable the functions
  3. There is no more need to spend such a long time implementing such a voice solution
  4. Hence the Distribution Centers with smaller operations can also afford the voice solutions, as these smaller operations were not able do so in the past

So if you are using a template based voice system that you implemented in the early 2000 or 5 to 7 years ago and feel that you could save a lot more by upgrading, you are correct, you should really check out the latest voice solutions that are available. What are your thoughts? Please share them in the comments section below.

Originally published at on Aug 21, 2017.

Puga Sankara is the co-founder of Smart Gladiator LLC. Smart Gladiator designs, builds, and delivers market-leading mobile technology for retailers, distributors, and 3PL service providers. SG LoadProof is a patent pending Centralized Enterprise Photo/Video Document System on Cloud for Supply Chain. SG LP is built on the fact that photos & videos are vital docs as important as POs/SOs/Legal Contracts/Fulfillment Orders that reside in ERP/WMS/TMS systems, that serve as compelling, conclusive, unequivocal proof of crucial, critical, vital operations executed in Supply Chain within/across orgs when fulfilling customer orders as well as meeting contractual obligations between orgs as merchandise is transferred between different parties that partake in Supply Chain functions & operations. And these photos/videos data should not be stored in someone’s Smartphone or Email Inbox or in their personal/work Computer, but should be stored in a Centralized Enterprise system, where such data can be pushed into super-fast, stored securely, accessible to all stake holders (CFO/Sales Reps/Customer Support/AR/AP) in an org, as well as facilitates super-fast retrieval/sharing. LP is an Enterprise System of record for Photo/Video docs & is as important as an ERP which is an enterprise system of record for POs, SOs, Legal Contracts between parties etc. that have huge legal ramifications, also as important as a WMS (Warehouse Management System) that hold indispensable shipment & fulfillment data on orders. Like how Instagram, Facebook, Snapchat etc. have evolved into social media platforms/systems that enable individuals to showcase their beauty/pretty clothes/lovely cosmetics/hep coolness etc., LoadProof is an Enterprise system that holds similar photos/videos, but for a different reason, not for show off, but to serve as compelling, conclusive, unequivocal & indisputable system of record and proof that can be presented even in the court of law, when there is a dispute between parties while they execute many facets of the Supply Chain functions & operations. Puga is a supply chain technology professional with more than 25 years of experience in deploying capabilities in the logistics and supply chain domain. His prior roles involved managing complicated mission-critical programs driving revenue numbers, rolling out a multitude of capabilities involving more than a dozen systems, and managing a team of 30 to 50 personnel across multiple disciplines and departments in large corporations such as Hewlett Packard. He has deployed WMS for more than 30 distribution centers in his role as a senior manager with Manhattan Associates. He has also performed process analysis walk-throughs for more than 50 distribution centers for WMS process design and performance analysis review, optimizing processes for better productivity and visibility through the supply chain. Size of these DCs varied from 150,000 to 1.2 million SQFT. Puga Sankara has an MBA from Georgia Tech. He can be reached at [email protected] or visit the company at Also follow him at

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.