Secret R&D, "threatening humanity", a code name caused panic on the whole network?What is OpenAI's Q*?

巴比特_

2023-11-26 09:03:09

Article source: GenAI New World

Author: Miao Zheng

Image source: Generated by Unbounded AI

Let’s put aside the Polar Smash Bros. within OpenAI’s management and talk about the latest rumors of this company - Q*. OpenAI sent an internal letter to employees on Nov. 22, acknowledging Q and describing the project as “an autonomous system beyond humans.” It’s really scary.

Although OpenAI has not officially released any news about Q*, we still have the ability to understand it in a superficial way.

First of all, the first step is to understand the pronunciation of Q*, the official name is Q-Star, which translates to Q-Star. Yes, you read that right, even though in deep learning, blocks are solved by multiplication, but in Q*, “*” does not mean multiplication, but “asterisk”. The letter “**Q” denotes the expected reward for an action in reinforcement learning. **

In the field of artificial intelligence, anything that has anything to do with capitalized Q is essentially Q learning. Q learning can be regarded as a kind of reinforcement learning based on the current evaluation criteria, which refers to the way in the training process, in the way of recording the historical reward value of the training, telling the agent how to choose the next step to be the same as the highest historical reward value. However, please note that the historical maximum reward value does not represent the maximum reward value of the model, it may or may not be, and it may even fail to hit. In other words, Q learning and agents are like the relationship between an analyst and a coach of a team. The coach is responsible for coaching the team, and the analyst is used to assist the coach.

In the process of reinforcement learning, the agent’s output decisions are fed back to the environment in order to receive reward values. Q learning, on the other hand, only records the reward value, so it does not need to model the environment, which is equivalent to “good results, all is good”.

However, looking at it this way, it seems that Q learning is not as good as the deep learning models commonly used in artificial intelligence, especially large models. With billions and tens of billions of parameters like the current one, Q learning not only does not help the model, but also increases complexity and thus reduces robustness.

Don’t worry, this is because the idea behind the above Q learning itself is just a basic concept that was born in 1989. **

In 2013, DeepMind launched an algorithm called Deep Q Learning by improving Q learning, the most distinctive feature of which is the use of experience playback, sampling from multiple results in the past, and then using Q learning, so as to improve the stability of the model and reduce the divergence of the training direction of the model due to a certain result.

However, to tell the truth, there is a reason why this concept has not become popular, and from a practical point of view, the biggest role of deep Q learning in the academic community has been the development of DQN.

DQN refers to Deep Q Network, which was born from deep Q learning. The idea of DQN is exactly the same as that of Q learning, but the process of finding the maximum reward value in Q learning is realized by neural networks. All of a sudden, it became fashionable.

DQN generates only one node at a time. At the same time, DQN generates a priority queue, and then stores the remaining nodes and action ancestors in the priority queue. Obviously, one node is definitely not enough, and if the whole process is only one node, the final solution must be ridiculously wrong. When a node and an action ancestor are removed from the queue, a new node will be generated based on the association that the action applies to the node that has already been generated, and so on.

People who know a little bit about the history of artificial intelligence will feel that the more they look at it, the more familiar they become, isn’t this the high-end version of Freud asking for a side length?

In modern computers, the core principle used by processors is the Freud algorithm, which is used to find the shortest path between two points by comparing it with the historical optimum. The purpose of memory is to store computations in a priority manner, and each time the processor completes a computation, the memory throws the next computation to the processor.

DQN is essentially the same.

That’s basically what Q means, so what does * mean?

**Judging from the analysis of many industry insiders, it is very likely that the * refers to the A* algorithm. **

This is a heuristic. Without rushing into what heuristics are, let me tell you a joke:

A asks B, “Quickly find the product of 1928749189571*1982379176”, and B immediately answers, “32”. When I heard this, I wondered that when two numbers of such a large number were multiplied, it was impossible for the answer to be two digits. B asked A: “Are you going to say it’s fast?”

It seems outrageous, but heuristics are the same.

Its essence is estimation, and you can only choose one between efficiency and positive solution. Either it’s very efficient, but sometimes it’s wrong, or it’s very accurate, and sometimes it takes a long time. The A* algorithm first uses a heuristic algorithm to estimate an approximate value, which is likely to deviate greatly from the correct solution. Once the estimation is complete, the loop starts traversing, and if there is no way to solve it, it is revalued until the solution starts to appear. This is repeated to finally arrive at the best solution.

Although the best solution can be obtained, A* is the second type mentioned above, and the answer is correct, and it takes a long time. It’s okay to put it in a lab environment, but if this algorithm is placed on a personal device, it may cause memory overflows and cause system problems, such as blue screens.

Therefore, this limitation makes the A* algorithm often applied to some less complex models in the past, the most typical is character pathfinding in online games. In some large games, the moment the character starts pathfinding, it is because of the A* algorithm.

On the whole, the current consensus in the artificial intelligence circle is that**The Q* algorithm mentioned in OpenAI’s internal letter is probably a combination of Q learning and A, that is, saving computing power, saving memory, and getting the best solution - because it can’t always spend more computing power and waste memory, and finally can’t get the best solution!

And, just as OpenAI finally made the basic model, it also existed for a long time, and was even ignored by people for a while, until OpenAI rediscovered its potential with specific and innovative methods. Today, people naturally have reason to believe that in the two long-standing algorithm ideas of Q and A, OpenAI can repeat the old tricks and create miracles again - of course, the harm that this miracle may bring to mankind has also made more people worried because of the recent OpenAI farce.

Therefore, going back to this algorithm, Q* is most likely to use Q learning to quickly find the valuation of the near-optimal solution, and then use the A* algorithm to solve it in a small area, eliminating a lot of meaningless calculation processes, so as to quickly find the best solution. But what exactly OpenAI is going to do will have to wait for the public paper (if it can wait).

The emergence of **Q* actually shows a problem, and the leading companies of artificial intelligence realize that the process of solving in the current development of artificial intelligence is more meaningful than solving. Because now only pursuing the correctness of the answer can no longer meet people’s needs for artificial intelligence. For example, on OpenCompass, even if the average score difference is 10 or 20 points, if you look at the accuracy of understanding, there is no big gap between the best model and the worst model.

Amid the speculation and panic, one of the claims about Q is that Q can solve very advanced math problems. Andrew Rogosky, director of the Surrey Institute for Human-Centered Artificial Intelligence, said: "We know that existing AI has been shown to be capable of doing math at undergraduate level, but is not capable of handling more advanced math problems. But Q* is most likely used to solve difficult math problems. "Maybe when Q* comes out, you can test its Goldbach conjecture. Mathematics is considered to be one of the greatest crystallizations of human wisdom, so Q* is just a code name that has caused panic across the Internet.

And behind Q* is also linked to OpenAI’s mission - that is, the exploration of artificial general intelligence (AGI), and even superintelligence. OpenAI defines AGI as an autonomous system that surpasses humans in the most economically valuable tasks, and Q* is a step towards AGI by OpenAI.

At the moment, OpenAI has not commented on Q and the internal letter leak, but I have mixed feelings. I am happy that Q* has strong capabilities, and the development of artificial intelligence will go further. At the same time, I was also worried that the Q* gimmick was bigger than the reality, and in the end, the test results were just like that on the day they were released, which made me be slapped in the face.

View Original

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments