The revolution of voice assistants

· 23/05/2018

Making a reservation or asking an appointment for the doctor is something of the past, now thanks to the technology developed by Google, a personal voice assistant will do it for you. Welcome to the present of artificial intelligence!

Last tuesday, May 8, the technology giant brought together developers from around the world for the annual conference Google IO 2018, focused on exploring the future of technology. This year the focus has been placed on the development of artificial intelligences. Among the most important releases of AI that have been presented, one of the most prominent has been Google Duplex, an artificial intelligence that has the ability to engage in real conversations with users. How does it work and how could it revolutionize the industry?

From the Shoreline amphitheater in Mountain View (California), the general director Sundar Pichai was clear before his bet for the AI and the investigation, presenting in scoop "Google Duplex". This new technology will strengthen the functions of Google's assistant until now and will bring naturalness to the way of speaking thanks to the implementation of six new voices including that of John Legend. Isn`t it great?

A personal secretary in the palm of your hand

Among the possibilities offered by this AI, it is important to ask the voice assistant to book in a restaurant or take an appointment for the hairdresser. Google is clearly taking the lead in voice and artificial intelligence services industry. This new technology is expected to be tested this summer on the smart speakers of its own brand.

How could it improve our quality of life? The idea behind Google's Duplex is to make a realistic AI that speaks like us, reacts as we do and even makes us feel like we are talking to a real person. In the video of the conference, the AI does not sound like a robot or a voice like the one we are used to hearing from Siri, Alexa or Cortana. It seems that the future of voice aids has arrived!

According to Nick Fox (VP of design for Google Assistant): "We don`t want to force others to implement these changes, but that is how an assistant should sound".

How are we going to differentiate the ai of a human in a conversation if they are the same?

This question may have come to your mind, raising some ethical questions. The developers and designers who build the AI "have an obligation to reveal to anyone who interacts with them that they are talking to a machine," said Paul Saffo (of Stanford University). Through social networks, many users have expressed concern about the use of these robots: "These machines could be used for political purposes and to give voting instructions," says Kay Firth-Butterfield on Twitter).

During the Google IO there was a demo of a conversation between the AI and a hairdressing employee. The Google assistant sounded amazingly realistic and even muttered "hemm ..." while the other interlocutor was checking hes schedule. The voice was so natural that the employee did not even realize she was talking to a machine. According to Google, this system can be very useful for customers and you will ask yourself why? Undoubtedly one of the main advantages is the saving of time that it supposes for the users, as well as for the small companies that do not have online reservation system. The goal is to help you manage and perform tasks.

What does duplex technology save inside?

At the heart of Google Duplex, we find an artificial neural network that has been trained to exchange big data over the phone. Calls are divided into multiple tasks: manage pauses, interruptions, give detailed information or synchronize with the speaker. The AI also adapts to the answers depending on the perceived importance. The result? Awesome.

Despite the complexity of understanding human language and drawing conclusions, Google Duplex is a system that is able to understand the nuances of conversation. Gather the understanding of natural language, deep learning and textual speech:

The Natural Language Understanding (NLU) is also used by IBM to process advanced text analysis. It extracts a lot of content data (keywords, concepts, relationships, etc.) and understands the feeling and emotion. It is possible to find out if the feeling of an article is positive or negative and obtain information about the emotion with which the author is writing. In addition, it can determine in which part of the article the writer is expressing anger, sadness, fear or joy.

In-depth learning is part of machine learning methods based on data representations. Nowadays, the power of AI helps computers develop superhuman abilities and the recognition of images. Thus, this type of learning allows scientists to use their resources effectively, analyzing in a month what used to take 10 years. The devices we use every day translate even the most complex languages from voice to text and from images to words. In 2015, Google's DeepMind created the AlphaGo program that uses self-learning to beat the real players of the Go board game.

Text-to-Speech (TTS) technology is a voice engine used to broadcast words from any device. For example, if you travel somewhere using Google Maps, TTS talks to you to tell you where to go. It works with all types of digital devices (computers, tablets, smartphones). The voice is generated by computer and not only reads texts but also images through the process of scanning and optical character recognition (OCR) in real time. This technology is used to help children in the development of their reading skills.

It seems that the future is here!

trends