英语新闻丨错例都一样!斯坦福学生团队致歉抄袭中国大模型

China Daily Podcast - A podcast by China Daily

"'Fake it before you make it' is an ignoble product of Silicon Valley," said Christopher Manning, director of the Artificial Intelligence Laboratory at Stanford University, commenting on some researchers at the university who plagiarized the achievements by institutions such as China's Tsinghua University.斯坦福大学人工智能实验室主任克里斯托弗·曼宁就该校某些研究人员抄袭中国清华大学等机构成果的行为评论说:“‘作假,直至成功’,这是硅谷不光彩的文化。”On May 29, a research team at Stanford University released a large model called Llama3-V, claiming it can achieve the same effects as large models such as GPT-4V with a pre-training cost of only US$500. The news was widely spread on social media and in the academic community of artificial intelligence.5月29日,斯坦福大学一个研究团队发布名为Llama3-V的大模型,声称只要500美元的预训练成本,就能用它获得比肩GPT-4V等著名大模型的效果。这一消息在社交媒体和人工智能学术界被广泛传播。However, industry insiders soon suspected that the Standford team plagiarized the MiniCPM-Llama3-V 2.5 large model released by Tsinghua University and other Chinese institutions.Both Llama3-V and the MiniCPM-Llama3-V 2.5 large model are based on the open-source Llama3 large model. Still, the team in Tsinghua conducted unique training, including using the "Tsinghua Bamboo Slips," a collection of Chinese texts written on strips of bamboo which date back to the Warring States Period (475-221 BC), to train the model to recognize ancient Chinese characters.但业内人士很快怀疑,斯坦福团队抄袭了清华大学等中国机构发布的MiniCPM-Llama3-V 2.5大模型。Llama3-V和MiniCPM-Llama3-V 2.5大模型都基于开源的Llama3大模型。不过清华团队进行了独特的训练,包括利用“清华简”,这是一套写在竹片上的中国文字,可以追溯到战国时期(公元前475年至公元前221年),以训练模型识别古代汉字。Tests show that the model released by the Stanford University team can also recognize the "Tsinghua Bamboo Slips."测试显示,斯坦福大学这个团队发布的大模型居然也能识别“清华简”。"We are quite sure that the Stanford team has plagiarized our big model research results," Liu Zhiyuan, a tenured associate professor of the Department of Computer Science at Tsinghua University, told Xinhua.清华大学计算机系长聘副教授刘知远对新华社记者说:“已经比较确信,斯坦福这个团队‘套壳’了我们的大模型研究成果。”"The data we scanned and annotated word by word from the 'Tsinghua Bamboo Slips' has never been made public, and Llama3-V has shown the same ability to identify the 'Tsinghua Bamboo Slips,' even the error examples are the same," said Liu, who is also a member of the Tsinghua big model team.刘知远说:“我们从‘清华简’逐字扫描并标注的数据集从未公开,而Llama3-V展现出了一模一样的识别‘清华简’能力,连做错的样例都一样。”刘知远是清华这个大模型团队成员。As doubt accumulated, the Stanford team deleted the database and promotion articles on the Internet, Liu said, adding "from the evidence and their reactions, the nature of plagiarism has been relatively confirmed."在质疑声发酵后,斯坦福大学团队删除了网上发布的数据库和宣传文章,刘知远说,并补充“从证据和对方反应来看,抄袭性质已比较确定”。Following Manning's criticism, two members of the Stanford team, Aksh Garg and Siddharth Sharma, formally apologized on social media.在曼宁的批评之后,斯坦福团队的两名成员阿克什·加格(Aksh Garg)和西达尔特·夏尔马(Siddharth Sharma)在社交媒体上正式道歉。"We've taken all references to Llama3-V down and we apologize once again for the inconvenience we may have caused," they said.他们说:“我们已经撤下了所有提及Llama3-V的内容,我们再次为我们可能造成的不便表示歉意。”Amid the current AI boom, this incident has aroused widespread attention. It shows that although the United States is leading in AI technologies overall, it is far from omnipotent.在当前人工智能热潮中,这一事件引起了广泛关注。这表明,尽管美国在人工智能技术方面总体领先,但它远非全能。The Silicon Valley where Stanford University is located is considered to be the center of innovation in the United States. While having nurtured many advanced technologies, it has also cultivated a negative culture including the "fake it till you make it" ethos.斯坦福大学所处的硅谷地区被认为是美国科技创新的中心。既孕育了许多先进技术,也培育出“作假,直至成功”等负面文化。For example, Elizabeth Holmes, who dropped out of Stanford University to start a business, boasted that she had a disruptive technology that could draw finger blood to test diseases like cancers. She was regarded as a female Steve Jobs but was later found to have fooled everyone and was sentenced to imprisonment for fraud.比如,从斯坦福大学退学创业的伊丽莎白·霍姆斯曾吹嘘有颠覆性检测技术可“抽指血查癌症”。她一度被视为女史蒂夫·乔布斯,但后来发现她造假,并因欺诈被判入狱。When Google's artificial intelligence model Gemini Pro was asked in Chinese who it was, it would answer that it was "Ernie Bot", a Chinese big model developed by Baidu. Industry insiders believe that the reason may be that Google "referenced" the relevant data of the large model "Ernie Bot" when training its large model.当谷歌的人工智能模型“双子座”Pro版(Gemini Pro)被用中文问及是谁时,它会回答自己是百度开发的中文大模型“文心一言”。业界人士认为,其原因可能是谷歌在训练大模型时“参考”了“文心一言”相关数据。"China's AI research has an increasing influence," Liu said, noting the plagiarism incident reflects that "our innovative achievements are attracting international attention."“中国人工智能研究的国际影响力越来越大”,刘知远说,并指出抄袭事件反映了“我们的创新成果正在受到国际关注”。Overall, there is still a significant gap between China's research level and the world's top level, but in some specific segments such as AI innovation, China has rapidly grown into an important promoter, he added.他补充说,总体而言,中国研究与国际顶尖成果仍有显著差距,但在人工智能科技创新等一些具体领域,中国已快速成长为重要推动者。Plagiarism剽窃disruptive technology颠覆性技术