The biggest reason for Google's acquisition of DeepMind

In 2014, Google spent more than $500 million to acquire a small company based in London: DeepMind. Prior to this, DeepMind published a paper on playing video games with deep intensive learning at the NIPS conference in December 2013. Follow-up research results Human-level co ntrol through deep reinforcement learning in 2015 In February, the cover of Nature was published. Later, the deep learning + reinforcement learning game was used in Go, so we had the Alpha dog.

Looking back at Deep Q Learning, which started DeepMind, it looks like a very simple piece of software, an automated program designed specifically for Atari video games. However, it was seen as the first attempt of "Universal Intelligence" - the paper shows that this algorithm can be applied to 50 different Atari games, and the performance is beyond the human level. This is the depth Q learner.

Use Super Marie to give an example. We have a video clip of the game as a data input, with the direction of Mario's movement to mark the data. These training data are continuous, new video frames are constantly being produced in the game world, and we want to know how to act in this world.

It seems that the best way is to try. Keep trying and making mistakes so that we can understand the best form of interaction with the game world.

Reinforcement learning is used to solve such problems. Whenever Mario does something that helps to win the game, positive labels will appear, but there is a delay in their appearance. Rather than calling them labels, the more exact name is "reward Reward."

We represent the entire game process as a sequence of states, actions, and rewards. The probability of each state depends only on the previous state and the action performed. This is called the Markov feature. ", named after the Russian mathematician Markov. This decision process is called the Markov process.

If a series of rewards after a point is represented as a function, the value of this function represents the best possible score at the end of the game. When a given action is performed in a given state, this function is used to measure the quality of an action in that state. This is the Q function, also known as the Quidditch function, ah no, the quality function.

When Mario decides which possible action to perform, he will select the action with the highest Q value, and the process of calculating the Q value is the process of learning.

Digital Signage Outdoor

Digital Signage Outdoor,Digital Advertising Boards,Display Digital Signage,Healthcare Digital Signage

Guangdong Elieken Electronic Technology Co.,Ltd. , https://www.elieken.com