调用百度API 对文本进行情感倾向分析（舆情分析）-五八三

@[TOC]

# 1.准备工作

1.注册百度账号，登录百度智能云，点击总览选择自然语言处理，创建应用（创建选项认真阅读，填写）

点击并拖拽以移动编辑

创建好应用会生成相应的AppID API Key Secret Key

点击并拖拽以移动编辑

2.要调用百度API，就要获得权限，利用获取到的API Key Secret Key去获取Access Token

获取的地址

https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=API key&client_secret=Secret Key&

点击并拖拽以移动

点击并拖拽以移动编辑

访问这个地址，获得Access Token（但是Access Token 有一个有效期超过有效期则调用api将会不成功）

expires_in: Access Token的有效期(秒为单位，一般为1个月)

点击并拖拽以移动编辑

总结：要调用API 需要a登录–b创建应用–c获得API Key 和Secret Key–d访问授权地址获得Access Token

# 2.调用API测试（简单的例子测试）

利用python调用百度api测试（pyhon直接官网下载，编辑器使用的是其自带的编辑器：IDLE）

做最简单的调用：保存Access Token调用百度API

情感倾向分析：

HTTP方法: POST 请求URL: https://aip.baidubce.com/rpc/2.0/nlp/v1/sentiment_classify

参数：access_token（通过API Key和Secret Key获取的access_token）

最简单的例子：可以直接使用（编辑器：IDLE）：

import re

import requests

import json

def get_emotion( data):
    # 定义百度API情感分析的token值和URL值
    token = '24.bcc989b57db903cc1189346275b7a372.2592000.1604971755.282335-22803254' 
    url = 'https://aip.baidubce.com/rpc/2.0/nlp/v1/sentiment_classify?charset=UTF-8&access_token={}'.format(token)
    new_each = {'text': data  } # 将文本数据保存在变量new_each中，data的数据类型为string
    new_each = json.dumps(new_each)
    res=requests.post(url,data=new_each) # 利用URL请求百度情感分析API
    res_text = res.text  # 保存分析得到的结果，以string格式保存
    print("content: ", res_text)
    result = res_text.find('items')  # 查找得到的结果中是否有items这一项
    positive = 1
    if (result != -1):  # 如果结果不等于-1，则说明存在items这一项
        json_data = json.loads(res.text)
        negative = (json_data['items'][0]['negative_prob'])  # 得到消极指数值
        positive = (json_data['items'][0]['positive_prob'])  # 得到积极指数值
        print("positive:",positive)
        print("negative:",negative)
        if (positive > negative):  # 如果积极大于消极，则返回2
            return 2
        elif (positive == negative):  # 如果消极等于积极，则返回1
            return 1
        else:
            return 0  # 否则，返回0
    else:
        return 1
    
def main():
    txt1="有些时候，宇宙似乎是有意使一些事情变得如此有趣。科学家们发现了一个“π行星”，它的大小与我们的地球相仿，距离我们大约185光年"
    print("txt1测试结果：",get_emotion(txt1))

if __name__  == "__main__":
    main()

点击并拖拽以移动

运行的结果：

点击并拖拽以移动编辑

# 3.进阶API测试（通过网址抓取网页信息分析情感倾向）

百度情感分析API的上限是2048字节，因此判断文章字节数小于2048，则直接调用若超过限制，则需要将文本分段

通过输入网址，将网页内容筛选出来进行情感倾向分析

import re

import requests

import json

from bs4 import BeautifulSoup
# 将text按照lenth长度分为不同的几段
def cut_text(text, lenth):
    textArr = re.findall('.{' + str(lenth) + '}', text)
    textArr.append(text[(len(textArr) * lenth):])
    return textArr  # 返回多段值

def get_emotion( data):
    # 定义百度API情感分析的token值和URL值
    token = '24.bcc989b57db903cc1189346275b7a372.2592000.1604971755.282335-22803254' 
    url = 'https://aip.baidubce.com/rpc/2.0/nlp/v1/sentiment_classify?charset=UTF-8&access_token={}'.format(token)
    if (len(data.encode()) < 2048):
        new_each = {'text': data  } # 将文本数据保存在变量new_each中，data的数据类型为string
        new_each = json.dumps(new_each)
        res=requests.post(url,data=new_each) # 利用URL请求百度情感分析API
        res_text = res.text  # 保存分析得到的结果，以string格式保存
        print("content: ", res_text)
        result = res_text.find('items')  # 查找得到的结果中是否有items这一项
        positive = 1
        if (result != -1):  # 如果结果不等于-1，则说明存在items这一项
            json_data = json.loads(res.text)
            negative = (json_data['items'][0]['negative_prob'])  # 得到消极指数值
            positive = (json_data['items'][0]['positive_prob'])  # 得到积极指数值
            print("positive:",positive)
            print("negative:",negative)
            if (positive > negative):  # 如果积极大于消极，则返回2
                return 2
            elif (positive == negative):  # 如果消极等于积极，则返回1
                return 1
            else:
                return 0  # 否则，返回0
        else:
            return 1
    else:
        print("文章切分")
        data = cut_text(data, 1500)  # 如果文章字节长度大于1500，则切分
        sum_positive = 0.0  # 定义积极指数值总合
        sum_negative = 0.0  # 定义消极指数值总和
        for each in data:  # 遍历每一段文字
            new_each = {
                'text': each  # 将文本数据保存在变量new_each中
            }
            new_each = json.dumps(new_each)
            res = requests.post(url, data=new_each)  # 利用URL请求百度情感分析API
            res_text = res.text  # 保存分析得到的结果，以string格式保存
            result = res_text.find('items')
            if (result != -1):
                json_data = json.loads(res.text)  # 如果结果不等于-1，则说明存在items这一项
                positive = (json_data['items'][0]['positive_prob'])  # 得到积极指数值
                negative = (json_data['items'][0]['negative_prob'])  # 得到消极指数值
                sum_positive = sum_positive + positive  # 积极指数值加和
                sum_negative = sum_negative + negative  # 消极指数值加和
            print(sum_positive)
            print(sum_negative)
            if (sum_positive > sum_negative):  # 积极 如果积极大于消极，则返回2
                return 2
            elif (sum_positive == sum_negative):  # 中性 如果消极等于于积极，则返回1
                return 1
            else:
                return 0  # 消极，返回0
def get_html(url):
    headers = {
        'User-Agent':'Mozilla/5.0(Macintosh; Intel Mac OS X 10_11_4)\
        AppleWebKit/537.36(KHTML, like Gecko) Chrome/52 .0.2743. 116 Safari/537.36'
 
    }     #模拟浏览器访问
    response = requests.get(url,headers = headers)       #请求访问网站
    html = response.text       #获取网页源码
    soup = BeautifulSoup(html, 'lxml')   #初始化BeautifulSoup库,并设置解析器
    a=soup.select('p')
    text=""
    for i in a:
         text=text+i.text
    return text
             
def main():
    txt1=get_html("https://baijiahao.baidu.com/s?id=1680186652532987655&wfr=spider&for=pc")
    print(txt1)
    print("txt1测试结果：",get_emotion(txt1))

if __name__  == "__main__":
    main()

点击并拖拽以移动

点击并拖拽以移动编辑

# 4.连接数据库对数据库进行增删改查

数据库的连接，我的应用场景：将网址存在数据库，判断该网址的文本是积极还是消极，步骤：

连接数据库，查询数据库获得url地址，通过url获得网址的文本信息，判断是消极还是积极，再将结果存入数据库，（代码已经调通，只是在上面基础上再加一些功能，可以自己做一哈，若需要源码则留言或私信）

# 5.利用java调用python脚本

调用python脚本的方法有很多，可以自行百度，我这里用的是Runtime.getRuntime().exec（）

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class testPython {
    public static void main(String[] args){
        Process process;
        try{
            process=Runtime.getRuntime().exec("python D:\Users\2.py");
            BufferedReader in = new BufferedReader(new InputStreamReader((process.getInputStream())));
            String line =null;
            line=in.readLine();
            in.close();
            process.waitFor();

        }catch (IOException e) {
            e.printStackTrace();
        }catch (InterruptedException e){
            e.printStackTrace();
        }
    }
}

点击并拖拽以移动

文章版权归作者所有，未经允许请勿转载，侵权请联系 admin@trc20.tw 删除。

THE END