Alibaba announces a patent related to large language model training

2026-03-09 02:04:05

Qichacha APP shows that recently, Alibaba (China) Co., Ltd. applied for the patent titled “Method, Device, and Equipment Based on Thought Chain Training for Large Language Models.”

The patent abstract states that in the embodiment of the invention, multiple initial sampling data are obtained, including images, auxiliary text information for the images, and standard review results for the images; based on each initial sampling data, thought chain data are generated, and a set of thought chain data is determined; the foundational large language model is fully fine-tuned based on the set of thought chain data to determine an intermediate large language model; based on the intermediate large language model and multiple initial sampling data, multiple intermediate thought chain data are generated iteratively; then, according to a pre-set reward function, reward values for each intermediate thought chain data are determined; finally, the intermediate large language model is reinforced using the Group Relative Policy Optimization (GRPO) algorithm to determine the target large language model. This method can improve the interpretability and review accuracy of large language models.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.