SHARE

Life after Alexa: Rebuilding Public Trust in IoT

3d rendering of Amazon Echo voice recognition system

Mass data-collection has been the norm for years, but the tide is beginning to turn, with users adopting privacy-centric tools and services in record numbers

Written By

Ian Garland

Jan 24, 2020

Just a few years ago, smart home assistants were the latest “must-have” technology. Even though they had limited functionality, their novelty, coupled with a relatively low price point (and strategic price reductions during the holidays) made them near-irresistible to consumers. Suddenly, tech companies had access to more personal information than they knew what to do with. In fact, as of last year, there were smart speakers in more than half a billion homes across the world.

More recently, though, these devices have begun to fall out of favor with the public. Almost every week, there’s a new data privacy scandal, and as a result, people are beginning to care more about who has access to their personal information. For many, the idea of having an always-on, always-listening assistant isn’t quite so appealing anymore.

Tech companies have lost the public’s trust, but smart assistants still have plenty of potential, especially given how useful they could become with future advances in machine learning. Moving forward, the onus is on smart speaker manufacturers to prove that their products deserve a second chance, and the best way to do this is with a long-overdue examination of their approach to user privacy.

What kind of information do smart assistants store?

The exact data that is collected varies based on the device’s manufacturer and recording capabilities (a unit with a built-in camera, for instance, might record video clips or report on ambient light levels at different times of day). However, the main problem, at least for now, is audio. Specifically, how much is recorded, why it’s recorded, and what is done with the recording once it reaches the company servers.

Voice-activated assistants are technically always listening, but they won’t process requests until they hear a specific trigger word. These triggers are specifically chosen to be uncommon sequences of syllables (for example, “OK, Google,” or “Alexa”) so that the speaker can more easily identify them from day-to-day conversation.

The problem is that algorithms are not infallible. As a result, smart assistants can mishear the trigger word and begin recording audio without the owner’s knowledge. If the algorithm isn’t sure whether a trigger word was said, it sends the audio clip to the company’s verification staff, who make the final decision. In theory, this kind of supervision helps reduce the number of false positives but in practice, it means that, at any time, people could be listening to what occurred in your home just moments before.

Your data’s journey doesn’t end there, though. The recording is then transcribed (and possibly annotated by staff) before both versions are stored on the company’s servers (where they remain indefinitely, even if you delete conversations stored locally on your device). If you made a request, the actions your assistant took in response may also be recorded.

Is it safer to individually manage IoT devices than to use a smart assistant?

Hypothetically, controlling smart light bulbs, sockets, and thermostats using their respective apps prevents your smart assistant from collecting data about your other devices. That said, there’s a good chance that these devices are also collecting some form of information, and lesser-known brands are unlikely to make detailed privacy information publicly available, so it’s difficult to know for sure which option is safer.

Let’s assume that your smart light bulb manufacturer can see basic information about your usage habits in real-time: what color the light is, its current color intensity, and whether the light is on or not. According to researchers from the University of Texas at San Antonio, this is enough to identify what song you’re listening to (assuming you have the kind of bulbs that change color in response to music). This experiment was fairly limited in scope, but if a multi-billion-dollar company with millions of hours of voice data decided to replicate it, it’s entirely possible that it could listen in on your conversations without even needing to use a microphone.

Even relatively simple devices like smart sockets collect information. One of the most widely-recognized models records 45 days worth of data, including your daily usage levels, Wi-Fi connection strength, how much your electricity costs per kWh, and how long the device has been on standby. This is enough information to make a reasonable guess at whether you are employed or not, how many people with you, and (if you have multiple smart sockets), the rough layout of your home. This specific model also includes a micro-USB port that the manufacturer suggests could be used for additional sensors in the future.

Tech giants must reconsider their approach to privacy

If the public is to continue to embrace smart assistants, they must feel that their privacy is respected. Organizations are beginning to take small steps in this direction, for instance, by adding camera covers to their devices, however token gestures like these miss the point: people aren’t worried about what their device might see, they’re concerned that it is watching at all.

The tech itself isn’t the problem: after all, voice-activated assistants can be extremely useful, particularly for people with limited mobility or vision. Instead, the issue is that the companies creating these devices have a “collect everything” mindset that simply isn’t compatible with user privacy. By adopting the three-step privacy-first approach laid out below, tech giants can assuage user concerns in a meaningful way. This, in turn, ensures that existing customers will continue to use the service and may even encourage some to upgrade when newer models are released.

1. Dispense with the secrecy

People are so used to problematic privacy policies that they just accept them automatically. After all, the alternative is not using a device that you’ve already paid for. Going forward, smart assistants should only collect data that is strictly required for a specific task, and this information should be deleted (both locally and from the manufacturer’s records) once it has served its purpose. Users should also be able to opt-out of certain features if they don’t agree to provide the required permissions, much as you can do with apps on a cell phone.

How would this work in practice? Let’s say you wanted to get directions to a particular restaurant. Your assistant would need to know both your current location and your destination. It may also ask for your intended travel time in order to find public transport information and so on, but this isn’t critical data, and should only be considered if the user gives it voluntarily. Once the user arrives, their destination data is no longer needed, and can safely be discarded (again, unless the user specifically adds a location to an address book).

In the example above, the user manages to get to their destination without any problems, despite providing minimal information. In contrast, if you were to try this with present-day devices, your smart assistant could find out not only where you’re going, but what kind of establishment it is, how often you visit, and how popular it is with others in your area. This functionality isn’t used to improve your experience; it’s used to build an advertising profile and as such, can be dispensed with no noticeable impact on the user.

2. Provide users with the tools they need to manage their own data

Simply put, users should not be punished for being privacy-conscious. If people are to truly consent to give their personal information away, they first have to understand what this means. Companies should no longer obfuscate their data collection policies in dozens of pages of legal terminology; customers should be able to see exactly what kind of information is recorded. More importantly, this information should be in an easily understandable format, free from half-truths about metadata or anonymized datasets. Users should also be informed why each piece of data is recorded, and how they can delete it, if they so choose.

These days, it can take weeks to find out what data a particular company holds on you. Certain websites make this process easier than others, but you can still expect to wait at least a few hours after finding the appropriate option in a maze of submenus.

Why shouldn’t viewing your personal information be as simple as possible? Users should be able to decide for themselves whether a company deserves access to their data or not. In the future, it needs to be far easier to view your data, but there’s no need to stop there.

For instance, at the moment, most smart assistants do not allow users to delete voice recordings automatically or view detailed statistics such as how often their audio clips were referred for manual verification. These aren’t especially advanced features, but they would go a long way towards regaining public trust and building a reputation as a transparent, privacy-conscious tech company.

3. Reduce third-party intervention to a minimum

The best-known smart assistants have been operating for years. Collectively, they have access to a wider range of voice data than anyone else, ever, with all kinds of languages and accents represented. If, after millions of dollars of investment and years of fine-tuning the algorithm, staff are still frequently having to manually intervene just to see if a trigger word was used, companies may have to accept that the current methods just aren’t working.

There are plenty of other ways to reduce the number of false positives without making users worry about who is listening. For instance, by limiting the hours in which your assistant is operational, you reduce the chance of background noise being misidentified as a trigger word. Many smart speakers already allow users to create customized routines such as playing music at a certain time each day; this tech could almost certainly be extended to schedule “quiet times” (such as the early hours of the morning), where the assistant simply won’t respond to verbal commands.

Some services have recently adopted individual voice recognition, where the assistant will only respond to specific people. However, this trains the assistant using only a handful of phrases (likely to maximize convenience for the user). This could lead to situations where it accepts commands from people who simply have a similar accent or tone as an authorized user. In contrast, a larger number of training phrases should help the assistant home in on what really makes a person’s voice unique, thereby reducing the range of accepted inputs and in turn, the number of false positives.

Finally, assistants in households with an abnormally high number of false positives could offer a feature that records background noise at several different times of the day. This could then be used to adapt the speech-recognition model for particular homes so that it’s less likely to be triggered by environmental factors such as the traffic outside, planes overhead, or inaudible frequencies.

Current data-collection practices cannot last

The only real issue people have with smart home assistants is that they feel they’re being watched. The problem is that they’re correct. Mass data-collection has been the norm for years, but the tide is beginning to turn, with users adopting privacy-centric tools and services in record numbers. Smart assistants have huge potential, but ultimately, if their creators fail to adapt to this new, user-first attitude, they risk being left behind.

Ian Garland

Ian is a digital privacy advocate, programmer, and futurist with years of experience covering related topics at www.Comparitech.com. Follow him on Twitter: @IanGarland_