【吴恩达-AIGC/ChatGPT提示工程课程】第四章 - 文本概括 Summarizing

2023-05-01 10:11:10 浏览数 (7)

1 引言

当今世界上有太多的文本信息,几乎没有人能够拥有足够的时间去阅读所有我们想了解的东西。但令人感到欣喜的是,目前LLM在文本概括任务上展现了强大的水准,也已经有不少团队将这项功能插入了自己的软件应用中。

本章节将介绍如何使用编程的方式,调用API接口来实现“文本概括”功能。

首先,我们需要OpenAI包,加载API密钥,定义getCompletion函数。

In [1]:

代码语言:javascript复制
import openai
import os
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
openai.api_key = OPENAI_API_KEY
​
def get_completion(prompt, model="gpt-3.5-turbo"): 
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # 值越低则输出文本随机性越低
    )
    return response.choices[0].message["content"]

2 单一文本概括Prompt实验

这里我们举了个商品评论的例子。对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。

输入文本

In [3]:

代码语言:javascript复制
prod_review = """
Got this panda plush toy for my daughter's birthday, 
who loves it and takes it everywhere. It's soft and  
super cute, and its face has a friendly look. It's  
a bit small for what I paid though. I think there  
might be other options that are bigger for the  
same price. It arrived a day earlier than expected,  
so I got to play with it myself before I gave it  
to her.
"""

输入文本(中文翻译)

In [4]:

代码语言:javascript复制
prod_review_zh = """
这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。
公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,
它有点小,我感觉在别的地方用同样的价钱能买到更大的。
快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。
"""

2.1 限制输出文本长度

我们尝试限制文本长度为最多30词。

In [4]:

代码语言:javascript复制
prompt = f"""
Your task is to generate a short summary of a product 
review from an ecommerce site. 
​
Summarize the review below, delimited by triple 
backticks, in at most 30 words. 
​
Review: ```{prod_review}```
"""
​
response = get_completion(prompt)
print(response)
Soft and cute panda plush toy loved by daughter, but a bit small for the price. Arrived early.

中文翻译版本

In [5]:

代码语言:javascript复制
prompt = f"""
你的任务是从电子商务网站上生成一个产品评论的简短摘要。
​
请对三个反引号之间的评论文本进行概括,最多30个词汇。
​
评论: ```{prod_review_zh}```
"""
​
response = get_completion(prompt)
print(response)
可爱软熊猫公仔,女儿喜欢,面部表情和善,但价钱有点小贵,快递提前一天到货。

2.2 关键角度侧重

有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验。

我们可以通过增加Prompt提示,来体现对于某个特定角度的侧重。

侧重于运输

In [6]:

代码语言:javascript复制
prompt = f"""
Your task is to generate a short summary of a product 
review from an ecommerce site to give feedback to the 
Shipping deparmtment. 
​
Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects 
that mention shipping and delivery of the product. 
​
Review: ```{prod_review}```
"""
​
response = get_completion(prompt)
print(response)
The panda plush toy arrived a day earlier than expected, but the customer felt it was a bit small for the price paid.

中文翻译版本

In [8]:

代码语言:javascript复制
prompt = f"""
你的任务是从电子商务网站上生成一个产品评论的简短摘要。
​
请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品运输上。
​
评论: ```{prod_review_zh}```
"""
​
response = get_completion(prompt)
print(response)
快递提前到货,熊猫公仔软可爱,但有点小,价钱不太划算。

可以看到,输出结果以“快递提前一天到货”开头,体现了对于快递效率的侧重。

侧重于价格与质量

In [9]:

代码语言:javascript复制
prompt = f"""
Your task is to generate a short summary of a product 
review from an ecommerce site to give feedback to the 
pricing deparmtment, responsible for determining the 
price of the product.  
​
Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects 
that are relevant to the price and perceived value. 
​
Review: ```{prod_review}```
"""
​
response = get_completion(prompt)
print(response)
The panda plush toy is soft, cute, and loved by the recipient, but the price may be too high for its size compared to other options.

中文翻译版本

In [12]:

代码语言:javascript复制
prompt = f"""
你的任务是从电子商务网站上生成一个产品评论的简短摘要。

请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品价格和质量上。

评论: ```{prod_review_zh}```
"""

response = get_completion(prompt)
print(response)
可爱软熊猫公仔,面部表情友好,但价钱有点高,尺寸较小。快递提前一天到货。

可以看到,输出结果以“质量好、价格小贵、尺寸小”开头,体现了对于产品价格与质量的侧重。

2.3 关键信息提取

在2.2节中,虽然我们通过添加关键角度侧重的Prompt,使得文本摘要更侧重于某一特定方面,但是可以发现,结果中也会保留一些其他信息,如价格与质量角度的概括中仍保留了“快递提前到货”的信息。有时这些信息是有帮助的,但如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求LLM进行“文本提取(Extract)”而非“文本概括(Summarize)”。

In [13]:

代码语言:javascript复制
prompt = f"""
Your task is to extract relevant information from  
a product review from an ecommerce site to give 
feedback to the Shipping department. 

From the review below, delimited by triple quotes 
extract the information relevant to shipping and  
delivery. Limit to 30 words. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)
"The product arrived a day earlier than expected."

中文翻译版本

In [19]:

代码语言:javascript复制
prompt = f"""
你的任务是从电子商务网站上的产品评论中提取相关信息。

请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。

评论: ```{prod_review_zh}```
"""

response = get_completion(prompt)
print(response)
快递比预期提前了一天到货。

3 多条文本概括Prompt实验

在实际的工作流中,我们往往有许许多多的评论文本,以下展示了一个基于for循环调用“文本概括”工具并依次打印的示例。当然,在实际生产中,对于上百万甚至上千万的评论文本,使用for循环也是不现实的,可能需要考虑整合评论、分布式等方法提升运算效率。

In [5]:

代码语言:javascript复制
review_1 = prod_review 

# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one 
had additional storage and not too high of a price 
point. Got it fast - arrived in 2 days. The string 
to the lamp broke during the transit and the company 
happily sent over a new one. Came within a few days 
as well. It was easy to put together. Then I had a 
missing part, so I contacted their support and they 
very quickly got me the missing piece! Seems to me 
to be a great company that cares about their customers 
and products. 
"""

# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, 
which is why I got this. The battery life seems to be 
pretty impressive so far. After initial charging and 
leaving the charger plugged in for the first week to 
condition the battery, I've unplugged the charger and 
been using it for twice daily brushing for the last 
3 weeks all on the same charge. But the toothbrush head 
is too small. I’ve seen baby toothbrushes bigger than 
this one. I wish the head was bigger with different 
length bristles to get between teeth better because 
this one doesn’t.  Overall if you can get this one 
around the $50 mark, it's a good deal. The manufactuer's 
replacements heads are pretty expensive, but you can 
get generic ones that're more reasonably priced. This 
toothbrush makes me feel like I've been to the dentist 
every day. My teeth feel sparkly clean! 
"""

# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal 
sale for around $49 in the month of November, about 
half off, but for some reason (call it price gouging) 
around the second week of December the prices all went 
up to about anywhere from between $70-$89 for the same 
system. And the 11 piece system went up around $10 or 
so in price also from the earlier sale price of $29. 
So it looks okay, but if you look at the base, the part 
where the blade locks into place doesn’t look as good 
as in previous editions from a few years ago, but I 
plan to be very gentle with it (example, I crush 
very hard items like beans, ice, rice, etc. in the  
blender first then pulverize them in the serving size 
I want in the blender then switch to the whipping 
blade for a finer flour, and use the cross cutting blade 
first when making smoothies, then use the flat blade 
if I need them finer/less pulpy). Special tip when making 
smoothies, finely cut and freeze the fruits and 
vegetables (if using spinach-lightly stew soften the  
spinach then freeze until ready for use-and if making 
sorbet, use a small to medium sized food processor)  
that you plan to use that way you can avoid adding so 
much ice if at all-when making your smoothie. 
After about a year, the motor was making a funny noise. 
I called customer service but the warranty expired 
already, so I had to buy another one. FYI: The overall 
quality has gone done in these types of products, so 
they are kind of counting on brand recognition and 
consumer loyalty to maintain sales. Got it in about 
two days.
"""

reviews = [review_1, review_2, review_3, review_4]

In [10]:

代码语言:javascript复制
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product  
    review from an ecommerce site. 

    Summarize the review below, delimited by triple 
    backticks in at most 20 words. 

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(i, response, "n")
0 Soft and cute panda plush toy loved by daughter, but a bit small for the price. Arrived early. 

1 Affordable lamp with storage, fast shipping, and excellent customer service. Easy to assemble and missing parts were quickly replaced. 

2 Good battery life, small toothbrush head, but effective cleaning. Good deal if bought around $50. 

3 The product was on sale for $49 in November, but the price increased to $70-$89 in December. The base doesn't look as good as previous editions, but the reviewer plans to be gentle with it. A special tip for making smoothies is to freeze the fruits and vegetables beforehand. The motor made a funny noise after a year, and the warranty had expired. Overall quality has decreased. 

In [ ]:

0 人点赞