Voice Evolves the Efficiency of the Warehouse


The first time I was confronted with the idea of voice picking, I was intrigued. This was quite a long time ago, but when I saw the design spec I sat down with the consultant who was designing the system to learn more. I asked him, “Why the heck would a user want to talk to the computer.” His response: “Because functions executed using voice are much more efficient.” Clearly, technology evolves, and that’s a good thing.

The technologies we’ve used up until now have some clear deficiencies. We started with scanning guns that were miles ahead of manual processes, but still had some limitations.

Cons with the RF handheld devices

  1. Users need to put the device down to finish their pick, and then pick it back up to continue scanning. That slows things down.
  2. Users always need to be concerned that they don’t drop and break the device, so they spend extra time looking for a safe place to set it down.
  3. Users must press keys on a small and crowded interface to use the device, which leads to errors and delays.
  4. Handheld devices are easy to lose, and even get dropped by mistake into boxes shipped to customers.

Next, wrist-mounted devices came along to address some of the concerns. The biggest boon was the hands-free capability, but there were still some down sides.

Cons with the wrist-mounted devices

  1. The smaller form factor required users to press multiple keys for certain commands. Log in and use are a struggle, and take lots of time.
  2. Some users found the wearables bulky and heavy.
  3. The second shift disliked using devices that had been strapped on to other workers for hours. (Sweaty, smelly, and unsanitary.)
  4. Complex key commands also makes training onerous.
  5. Finally, voice technology emerged to promise multiple benefits.

Voice benefits

  1. Voice combines hands free and eyes free. The picker doesn’t have to carry a device or look at a screen which speeds everything.  This provided clear benefit for to the picking process by increasing throughput (i.e. the amount of product received through the inbound doors and shipped through the outbound doors).  There’s a clear connection between orders shipped, orders invoiced, and company revenue. A billion-dollar retailer/distributor typically picks about 100 million units, so a savings of half a second per pick, results in 50 million seconds saved, equivalent to 13,889 hours, which at rate of $14 per hour, translates to $194,446 is savings per year.
  2. Voice commands preclude workers chatting with each other, which offers an unexpected time savings. In addition, the cool factor of the technology enhances user adoption. Also with voice, it keeps the operators going constantly, they don’t have the time to chit chat with their fellow pickers, hence that time is saved also.
  3. Paper-based systems are fast, but error prone. The voice system, because it is closely intertwined with the warehouse management system (WMS), caught errors immediately. In the end, it provided the best speed and accuracy combination.

Early in this evolution toward voice there was excitement. In fact, some wanted to make everything voice-enabled. It’s not a total panacea for all the productivity pains in the warehouse.  For one thing, voice solutions can be quite expensive, because:

  1. an interface with the WMS requires the development of extra custom interfaces. It needed an extra integration software to be developed to build interfaces to the WMS (Warehouse Management System), which was mostly custom
  2. Extra servers are needed to communicate with voice terminals
  3. The typical implementation timeline was three to six months.
  4.  The voice terminals were, and still are, very expensive.

Ease vs. Cost

Overall, voice was very expensive and took a while to implement.

Later, I got a chance to talk to some of the distribution center managers that had already implemented voice. This is what they told me.


  1. Voice improves productivity. They were promised up to 60%, but the average productivity improvement was somewhere around 20%.

  1. It supports multiple languages.
  2. Voice creates a real time, highly-accurate picking process (i.e. picks are immediately reported to and validated by the WMS).


  • Early voice technology followed a voice template approach. Users had to train the system when they started using it for the first time. This made it less accurate in some key situations, including:
    • When a user had a cold or some other situation that changed the voice.
    • When frustration or other emotions change the voice.
    • When there is a lot of background noise.
  • Voice systems used a check digit for warehouse locations so that the suer would have to read the check digit, adding a separate step, requiring both extra tasks and extra labels.

How has voice technology improved?

User-friendly voice technology leverages consumer devices

Consumer mobile devices, such as cell phones, have changed the game for the enterprises. Purpose-built devices are only required when powerful scanning technology is needed. Meanwhile, for most applications, consumer technology offers some benefits:

  1. Familiar technology requires less training time.
  2. The addition of voice capabilities (such as Siri and Alexa) to many consumer devices increases comfort level of operators.
  3. Technology advances, such as fast charging, offer better user experiences.
  4. Compatible consumer style headsets with micro phones designed for sports are great for warehouse users and are less expensive.

More powerful speech recognition engines

The latest Android and Apple devices have sophisticated voice engines with built in artificial intelligence and machine learning capabilities. It puts these capabilities into the hands of users at no cost. These voice engines can be easily leveraged using available application programming interface (APIs). The Google Voice Engine has taken technological leadership since Google converted to transcription using Long Short-term Memory Recurrent Neural Networks.

Meanwhile, computing power on consumer devices is on the rise, which makes voice commands both quick and accurate. Native voice recognition engines don’t require the recording of voice templates, which saves time for users. It also addresses the issue of voice changes due to emotion or illness.

Specific modes address background noise issues

As many WMSs become highly configurable and flexible, they can be easily configured to work with voice solutions and don’t require any speaking. Instead inputs are provided through the scanner, by scanning relevant barcodes, which works perfectly in a noisy environment. This also eliminates the need for check digits, because scanners are used to scan barcodes. This is faster and more accurate. In addition, many WMSs have built in voice interfaces, which means that solutions don’t need to be built from scratch.

Plug & Play implementation of voice solutions

Voice solutions are becoming plug and play, because consumer devices run highly-sophisticated terminal emulations that are easy to use with soft overlay keyboards and communication capabilities. Quick connection to a WIFI network and easily configurable voice solutions for text to speech capabilities get things up and running quickly.

Further, devices have gotten sophisticated enough to be used for both voice and scanning. Workers can move readily from one task or department to another using the same device at all times.

Costs coming down

Overall, the cost of the voice solution has come down significantly for several reason:

  1. The ability to use off the shelf hardware rather than expensive voice specific hardware, which could cost anywhere from $4,000 to $5,500.
  2. There is no need to buy the integration software to voice enable the functions.
  3. Implementation times are shrunk considerably when using standard technology.
  4. With standard and affordable equipment, voice capabilities are in the realm of even smaller sized operations.

Today, organizations that have never used a voice system should look at getting on board this trend. Further, if you have an older system from fifteen or even five years ago, you can save a lot by upgrading.  Let us know your thoughts on this in the comments section below.

Originally published at Smartgladiator.com on Feb 28, 2018.

Puga Sankara is the co-founder of Smart Gladiator LLC. Smart Gladiator designs, builds, and delivers market-leading mobile technology for retailers, distributors, and 3PL service providers. SG LoadProof is a patent pending Centralized Enterprise Photo/Video Document System on Cloud for Supply Chain. SG LP is built on the fact that photos & videos are vital docs as important as POs/SOs/Legal Contracts/Fulfillment Orders that reside in ERP/WMS/TMS systems, that serve as compelling, conclusive, unequivocal proof of crucial, critical, vital operations executed in Supply Chain within/across orgs when fulfilling customer orders as well as meeting contractual obligations between orgs as merchandise is transferred between different parties that partake in Supply Chain functions & operations. And these photos/videos data should not be stored in someone’s Smartphone or Email Inbox or in their personal/work Computer, but should be stored in a Centralized Enterprise system, where such data can be pushed into super-fast, stored securely, accessible to all stake holders (CFO/Sales Reps/Customer Support/AR/AP) in an org, as well as facilitates super-fast retrieval/sharing. LP is an Enterprise System of record for Photo/Video docs & is as important as an ERP which is an enterprise system of record for POs, SOs, Legal Contracts between parties etc. that have huge legal ramifications, also as important as a WMS (Warehouse Management System) that hold indispensable shipment & fulfillment data on orders. Like how Instagram, Facebook, Snapchat etc. have evolved into social media platforms/systems that enable individuals to showcase their beauty/pretty clothes/lovely cosmetics/hep coolness etc., LoadProof is an Enterprise system that holds similar photos/videos, but for a different reason, not for show off, but to serve as compelling, conclusive, unequivocal & indisputable system of record and proof that can be presented even in the court of law, when there is a dispute between parties while they execute many facets of the Supply Chain functions & operations. Puga is a supply chain technology professional with more than 25 years of experience in deploying capabilities in the logistics and supply chain domain. His prior roles involved managing complicated mission-critical programs driving revenue numbers, rolling out a multitude of capabilities involving more than a dozen systems, and managing a team of 30 to 50 personnel across multiple disciplines and departments in large corporations such as Hewlett Packard. He has deployed WMS for more than 30 distribution centers in his role as a senior manager with Manhattan Associates. He has also performed process analysis walk-throughs for more than 50 distribution centers for WMS process design and performance analysis review, optimizing processes for better productivity and visibility through the supply chain. Size of these DCs varied from 150,000 to 1.2 million SQFT. Puga Sankara has an MBA from Georgia Tech. He can be reached at [email protected] or visit the company at www.smartgladiator.com. Also follow him at www.pugasankara.com.