讓OpenAI GPT3替我寫數據競賽代碼！

文章目錄[隱藏]

GPT3介紹
OpenAI-GPT3 API注冊
OpenAI-GPT3 API 模型
OpenAI-GPT3 場景案例
- Text completion
  - 文本分類
  - 文本生成
- 代碼補全
OpenAI-GPT3 使用感受
參考資料

OpenAI是在美國成立的人工智能研究公司，核心宗旨在於(yu) 實現安全的通用人工智能(AGI)。他們(men) 開發的ChatGPT是一個(ge) 最先進的自然語言處理模型，可以實時生成類似人類的文本。

ChatGPT 是從(cong) GPT-3.5 係列中的一個(ge) 模型進行微調的，該模型於(yu) 2022 年初完成訓練。 GPT-3.5 係列是一係列模型，從(cong) 2021 年第四季度開始就混合使用文本和代碼進行訓練。

由於(yu) ChatGPT暫時是沒有開源，且比較適合用於(yu) 對話任務。為(wei) 了讓讀者能逐步了解GPT能做什麽(me) ，本文將介紹OpenAI已經公開的GPT3使用方法，可以使用免費的API來完成NLP和代碼生成任務。

GPT3介紹

生成型預訓練變換模型 3 （Generative Pre-trained Transformer 3，簡稱GPT3）是一個(ge) 自回歸語言模型，目的是為(wei) 了使用深度學習(xi) 生成人類可以理解的自然語言。GPT3的神經網路包含1750億(yi) 個(ge) 參數，是當時參數最多的神經網路模型。

GPT3 模型擁有非常多個(ge) 領域的先驗知識，當用戶通過自然語言向語言模型提出問題時，模型能夠回答其中的大多數問題。

GPT3 模型相比以往模型（如 BERT）的另外優(you) 勢，則是對於(yu) 大多數常規任務，在使用模型之前無需對其進行微調（Fine-tuning）操作。

GPT-3 模型的優(you) 勢在於(yu) ，用戶在使用該模型時，隻需要告訴GPT3需要完成的任務即可，而不需要預先為(wei) 想要完成的任務先微調一遍模型，比如：

Translate this into French: Where can I find a bookstore? Où puis-je trouver un magasin de livres?

以GPT3為(wei) 首提出是基於(yu) 預訓練語言模型的新的微調範式：Prompt-Tuning，其通過添加模板的方法來避免引入額外的參數，從(cong) 而讓語言模型可以在小樣本（Few-shot）或零樣本（Zero-shot）場景下達到理想的效果。

OpenAI-GPT3 API注冊

API介紹

OpenAI-GPT3 API 已部署在數以千計的應用程序中，其任務範圍從(cong) 幫助人們(men) 學習(xi) 新語言到解決(jue) 複雜的分類問題。

Github Copilot幫助更快地編寫代碼
Duolingo 使用 GPT-3 進行語法更正

API價格

對於(yu) 學習(xi) 者而言，也可以注冊(ce) 免費的GPT3 API，注冊(ce) 頁麵：https://openai.com/api/

讓OpenAI GPT3替我寫(xie) 數據競賽代碼！

對於(yu) 沒有綁定信用卡的用戶，可以免費使用約18美元的調用，當然不同的模型費用不同。

Ada：$0.0004 / 1K tokens
Babbage：$0.0005 / 1K tokens
Curie ：$0.0020 / 1K tokens
Davinci：$0.0200 / 1K tokens

Ada 是最快的模型，而 Davinci 是最強大的。這裏的token可以理解為(wei) pieces of words，整體(ti) 的價(jia) 格還是比較低的。

API使用

GPT3 API提供了多種語言的交互方法，最常見的Python可以參考如下代碼：

pip install openai

import os
import openai

# 填寫(xie) 你的API KEY openai.api_key = 'XXX'

response = openai.Completion.create(
model="text-davinci-003",
prompt="Say this is a test",
temperature=0, max_tokens=7
)

更多安裝指南：https://platform.openai.com/docs/libraries/community-libraries

API功能

OpenAI 訓練了大量的模型，並通過API提供對這些模型的訪問，可用於(yu) 解決(jue) 幾乎任何涉及處理語言的任務。

Content generation
Summarization
Classification
Data extraction
Translation

讓OpenAI GPT3替我寫(xie) 數據競賽代碼！

OpenAI-GPT3 API 模型

OpenAI API 由一係列具有不同功能和價(jia) 位的模型提供支持：

GPT-3：能夠理解並生成自然語言的模型
Codex：可以理解和生成代碼的模型，包括將自然語言翻譯成代碼
Content filter：檢測文本是否敏感或不安全的模型

GPT-3可以理解和生成自然語言，OpenAI提供四種主要類似，分別具有不同的功率級別，可適用於(yu) 不同的任務。

Latest model	Description	Max request
text-davinci-003	Most capable GPT-3 model. Can do any task the other models can do.	4,000 tokens
text-curie-001	Very capable, but faster and lower cost than Davinci.	2,048 tokens
text-babbage-001	Capable of straightforward tasks, very fast, and lower cost.	2,048 tokens
text-ada-001	Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.	2,048 tokens

更多模型介紹細節：https://platform.openai.com/docs/models/overview

模型論文和實現細節：https://platform.openai.com/docs/model-index-for-researchers

OpenAI-GPT3 場景案例

Text completion

https://platform.openai.com/docs/guides/completion/introduction

文本分類

單個文本分類Prompt

Decide whether a Tweet's sentiment is positive, neutral, or negative.

Tweet: I loved the new Batman movie!

Sentiment:

多個文本分類Prompt

Classify the sentiment in these tweets:

1. "I can't stand homework" 2. "This sucks. I'm bored ?" 3. "I can't wait for Halloween!!!" 4. "My cat is adorable ❤️❤️" 5. "I hate chocolate"

Tweet sentiment ratings:

文本生成

文本續寫Prompt

Brainstorm some ideas combining VR and fitness:

文本對話Prompt

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
Human:

文本翻譯Prompt

Translate this into French, Spanish and Japanese:

What rooms do you have available?

代碼補全

https://platform.openai.com/docs/guides/code/best-practices

寫Python代碼

""" 1. Create a list of first names 2. Create a list of last names 3. Combine them randomly into a list of 100 full names """

寫(xie) MySQL代碼

"""
Table customers, columns = [CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalCode, phones, Fax, Email, SupportRepId]
Create a MySQL query for all customers in Texas named Jane
""" query =

OpenAI-GPT3 使用感受

在申請了免費的API後，筆者使用了多個(ge) 例子，並嚐試使用GPT3來編寫(xie) 一些競賽代碼。GPT3的功能比較多，確實比較強大，但也有一些缺點和注意事項。

OpenAI-GPT3對英文支持比較好，將英文轉換為代碼的效果優於將中文轉換為代碼的效果。
OpenAI-GPT3的結果效果取決於：
- prompt文本
- 給模型的提示或案例
- 選擇的模型版權
OpenAI-GPT3對對話支持的效果不如ChatGPT，但在完成特定任務上，效果比ChatGPT更好。

如下為(wei) 編寫(xie) 的一段使用案例：

response = openai.Completion.create( model="text-davinci-003", prompt = ''' Convert this text to a python programmatic command

1. read train.csv and test.csv by pandas.
2. Is_Lead is the target variable, conver it to binary.
3. use decision tree from sklearn to train and predict on test.
'

'' , max_tokens=512, frequency_penalty=0.0, presence_penalty=0.0 )

print(response['choices'][0]['text'].strip())

輸出代碼：

import pandas as pd train = pd.read_csv('train.csv') test = pd.read_csv('test.csv') train['Is_Lead'] = train['Is_Lead'].apply(lambda x: 1 if x == 'Yes' else 0) test['Is_Lead'] = test['Is_Lead'].apply(lambda x: 1 if x == 'Yes' else 0)

from sklearn.tree import DecisionTreeClassifier
DTmodel = DecisionTreeClassifier(random_state=2)
DTmodel.fit(train.drop('Is_Lead', axis=1), train.Is_Lead)
prediction = DTmodel.predict(test.drop('Is_Lead', axis=1))