pandas-datareader一些用法备忘
pandas-datareader介绍
Pandas库提供了专门从财经网站获取金融数据的API接口,可作为量化交易股票数据获取的另一种途径
DataReader方法介绍
查看Pandas的操作文档可以发现,第一个参数为股票代码,苹果公司的代码为”AAPL”,国内股市采用的输入方式“股票代码”+“对应股市”,上证股票在股票代码后面加上“.SS”,深圳股票在股票代码后面加上“.SZ”。DataReader可从多个金融网站获取到股票数据,如“Yahoo! Finance” 、“Google Finance”等,这里以Yahoo为例。第三、四个参数为股票数据的起始时间断。返回的数据格式为DataFrame。
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| import pandas as pd from pandas_datareader import data
start_date = "2018-04-01" end_date = "2021-04-01"
stock = data.DataReader( "000001.SS", "yahoo", start_date, end_date ) print(stock.head(5)) print(stock.tail(5), "\n") print(stock.index) print(stock.columns) print(stock.shape)
|
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| High Low Open Close Volume Adj Close Date 2018-04-02 3192.340088 3159.986084 3169.779053 3163.178955 177700 3163.178955 2018-04-03 3144.332031 3119.132080 3130.012939 3136.633057 152200 3136.633057 2018-04-04 3163.340088 3128.866943 3147.049072 3131.111084 147000 3131.111084 2018-04-09 3146.093018 3110.302979 3125.441895 3138.293945 139600 3138.293945 2018-04-10 3190.648926 3139.081055 3144.257080 3190.322021 168200 3190.322021 High Low Open Close Volume Adj Close Date 2021-03-26 3423.222900 3373.316895 3373.316895 3418.326904 274600 3418.326904 2021-03-29 3449.833984 3409.886963 3429.632080 3435.300049 284800 3435.300049 2021-03-30 3457.629883 3423.320068 3432.530029 3456.679932 285400 3456.679932 2021-03-31 3452.209961 3420.830078 3452.209961 3441.909912 283000 3441.909912 2021-04-01 3470.030029 3438.830078 3444.810059 3466.330078 275200 3466.330078
DatetimeIndex(['2018-04-02', '2018-04-03', '2018-04-04', '2018-04-09', '2018-04-10', '2018-04-11', '2018-04-12', '2018-04-13', '2018-04-16', '2018-04-17', ... '2021-03-19', '2021-03-22', '2021-03-23', '2021-03-24', '2021-03-25', '2021-03-26', '2021-03-29', '2021-03-30', '2021-03-31', '2021-04-01'], dtype='datetime64[ns]', name='Date', length=728, freq=None) Index(['High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'], dtype='object') (728, 6)
|
数据分析
1、打印DataFrame数据前5行和尾部倒数5行
2、打印DataFrame数据索引和列名,索引为时间序列,列信息为开盘价、最高价、最低价、收盘价、复权收盘价、成交量
print stock.index
print stock.columns
3、打印DataFrame数据形状
print(stock.shape)
4、DataFrame数据每组的统计情况,如最小值、最大值、均值、标准差等
print stock.describe()
5、DataFrame数据中增加涨/跌幅列,涨/跌=(当日Close-上一日Close)/上一日Close*100%
(1)添加一列change,存储当日股票价格与前一日收盘价格相比的涨跌数值,即当日Close价格与上一日Close的差值,4月1日这天无上一日数据,因此出现缺失
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| change = stock.Close.diff() stock['Change'] = change print(stock.head(5))
''' High Low Open Close Volume Adj Close Change Date 2020-04-01 2773.364014 2731.079102 2743.541016 2734.521973 217300 2734.521973 NaN 2020-04-02 2780.637939 2719.904053 2720.228027 2780.637939 217900 2780.637939 46.115967 2020-04-03 2780.586914 2754.072998 2773.575928 2763.987061 200800 2763.987061 -16.650879 2020-04-07 2823.277100 2801.839111 2806.968018 2820.762939 270200 2820.762939 56.775879 2020-04-08 2823.214111 2800.295898 2805.916992 2815.368896 243500 2815.368896 -5.394043
'''
|
(2)对缺失的数据用涨跌值的均值就地替代NaN。
change.fillna(change.mean(),inplace=True)
(3)计算涨跌幅度有两种方法,pct_change()算法的思想即是第二项开始向前做减法后再除以第一项,计算得到涨跌幅序列。
stock[‘pct_change’] = (stock[‘Change’] /stock[‘Close’].shift(1))#
stock[‘pct_change1’] = stock.Close.pct_change()
1 2 3 4 5 6 7
| High Low Open Close Volume Adj Close Change pct_change pct_change1 Date 2020-04-01 2773.364014 2731.079102 2743.541016 2734.521973 217300 2734.521973 NaN NaN NaN 2020-04-02 2780.637939 2719.904053 2720.228027 2780.637939 217900 2780.637939 46.115967 0.016864 0.016864 2020-04-03 2780.586914 2754.072998 2773.575928 2763.987061 200800 2763.987061 -16.650879 -0.005988 -0.005988 2020-04-07 2823.277100 2801.839111 2806.968018 2820.762939 270200 2820.762939 56.775879 0.020541 0.020541 2020-04-08 2823.214111 2800.295898 2805.916992 2815.368896 243500 2815.368896 -5.394043 -0.001912 -0.001912
|
7、DataFrame数据中增加跳空缺口数值序列,这里定义的缺口为上涨趋势和下跌趋势中的突破缺口,上涨趋势中今天的最低价高于昨天收盘价为向上跳空,下跌趋势中昨天收盘价高于今天最高价为向下跳空。遍历每个交易日后将符合跳空缺口条件的交易日增加缺口数值。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| import pandas as pd from pandas_datareader import data import numpy as np
start_date = "2020-04-01" end_date = "2021-04-01"
stock = data.DataReader("000001.SS", "yahoo", start_date, end_date) change = stock.Close.diff() change.fillna(change.mean(), inplace=True) stock["Change"] = change stock["pct_change"] = stock["Change"] / stock["Close"].shift(1) stock["pct_change1"] = stock.Close.pct_change()
jump_pd = pd.DataFrame() for kl_index in np.arange(1, stock.shape[0]): today = stock.iloc[kl_index] yesday = stock.iloc[kl_index - 1] today["preCloae"] = yesday.Close if today["pct_change"] > 0 and (today.Low - today["preCloae"]) > 0: today["jump_power"] = today.Low - today["preCloae"] elif today["pct_change"] < 0 and (today.High - today["preCloae"]) < 0: today["jump_power"] = today.High - today["preCloae"] jump_pd = jump_pd.append(today) stock["jump_power"] = jump_pd["jump_power"] print(stock.loc["2020-04-01":"2021-04-01"])
|
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| High Low Open Close Volume Adj Close Change pct_change pct_change1 jump_power Date 2020-04-01 2773.364014 2731.079102 2743.541016 2734.521973 217300 2734.521973 3.011556 NaN NaN NaN 2020-04-02 2780.637939 2719.904053 2720.228027 2780.637939 217900 2780.637939 46.115967 0.016864 0.016864 NaN 2020-04-03 2780.586914 2754.072998 2773.575928 2763.987061 200800 2763.987061 -16.650879 -0.005988 -0.005988 -0.051025 2020-04-07 2823.277100 2801.839111 2806.968018 2820.762939 270200 2820.762939 56.775879 0.020541 0.020541 37.852051 2020-04-08 2823.214111 2800.295898 2805.916992 2815.368896 243500 2815.368896 -5.394043 -0.001912 -0.001912 NaN ... ... ... ... ... ... ... ... ... ... ... 2021-03-26 3423.222900 3373.316895 3373.316895 3418.326904 274600 3418.326904 54.736816 0.016273 0.016273 9.726807 2021-03-29 3449.833984 3409.886963 3429.632080 3435.300049 284800 3435.300049 16.973145 0.004965 0.004965 NaN 2021-03-30 3457.629883 3423.320068 3432.530029 3456.679932 285400 3456.679932 21.379883 0.006224 0.006224 NaN 2021-03-31 3452.209961 3420.830078 3452.209961 3441.909912 283000 3441.909912 -14.770020 -0.004273 -0.004273 -4.469971 2021-04-01 3470.030029 3438.830078 3444.810059 3466.330078 275200 3466.330078 24.420166 0.007095 0.007095 NaN
[244 rows x 10 columns]
|
8、DataFrame数据保留两位小数显示
format = lambda x: ‘%.2f’ % x
stock = stock.applymap(format)
print stock.loc[“2017-04-26”:”2017-06-15”]#默认打印全部列
股价数据的可视化
Matplotlib是使用Python进行绘图里非常方便的库。这次 plot使用的数据是 Adj Close栏的数据。这是所说的已调整收盘价。
如下仅仅需要两行写就可以简单的将股价作为时间序列数据画出来。
1 2 3 4 5 6 7 8 9 10 11 12
| import pandas as pd from pandas_datareader import data import numpy as np import matplotlib.pyplot as plt
start_date = "2020-04-01" end_date = "2021-04-01"
stock = data.DataReader("000001.SS", "yahoo", start_date, end_date)
stock['Adj Close'].plot(legend=True, figsize=(10,4)) plt.show()
|
实例操作:Python提取雅虎财经数据,并做数据分析和可视化
以csv格式存放
1 2 3 4 5 6 7 8
| import numpy as np import pandas as pd import pandas_datareader.data as web import datetime
df_csvsave = web.DataReader("000001.SS","yahoo",datetime.datetime(2019,1,1),datetime.date.today()) print (df_csvsave) df_csvsave.to_csv(r'C:\Users\15461\Desktop\table.csv',columns=df_csvsave.columns,index=True)
|
Author:
hhgw
License:
Copyright (c) 2023 CC-BY-NC-4.0 LICENSE
Slogan:
There is no fate but what we make.