Pandas Time Series Data Manipulation PDF

Document Details

SecureBliss1365

Uploaded by SecureBliss1365

Chulalongkorn University

Stefan Jansen

Tags

pandas python time series data analysis

Summary

This document provides a comprehensive guide to using the Pandas library in Python for manipulating time series data. It explains how to work with dates and times, including the creation of time series objects using Timestamp, Period, and DatetimeIndex. Practical code examples and visualizations are provided to illustrate common time-series manipulation tasks. It's designed for intermediate to advanced learners in data analysis.

Full Transcript

import pandas as pd # assumed imported going forward from datetime import datetime # To manually create dates - ↑ time_stamp = pd.Timestamp(datetime(2017, 1, 1))...

import pandas as pd # assumed imported going forward from datetime import datetime # To manually create dates - ↑ time_stamp = pd.Timestamp(datetime(2017, 1, 1)) - pd.Timestamp('2017-01-01') == time_stamp time_stamp.year True # Understands dates as strings 2017 time_stamp # type: pandas.tslib.Timestamp time_stamp.day_name() day of west G - - Timestamp('2017-01-01 00:00:00') 'Sunday' period = pd.Period('2017-01') period + 2 period # default: month-end d Period('2017-03', 'M') Period('2017-01', 'M') pd.Timestamp('2017-01-31', 'M') + 1 period.asfreq('D') # convert to daily Timestamp('2017-02-28 00:00:00', freq='M') ↓ Period('2017-01-31', 'D') pd.Period() - period.to_timestamp().to_period('M') pd.Timestamp() - Period('2017-01', 'M') of timestamp ray & pd.date_range start end periods freq index index = pd.date_range(start='2017-1-1', periods=12, freq='M') - Timestamp('2017-01-31 00:00:00', freq='M') index index.to_period() DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31',..., '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'], PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04',..., dtype='datetime64[ns]', freq='M') '2017-11', '2017-12'], dtype='period[M]', freq='M') pd.DateTimeIndex np.random.random pd.DataFrame({'data': index}).info() [0,1] RangeIndex: 12 entries, 0 to 11 Data columns (total 1 columns): data = np.random.random((size=12,2)) data 12 non-null datetime64[ns] pd.DataFrame(data=data, index=index).info() dtypes: datetime64[ns](1) DatetimeIndex: 12 entries, 2017-01-31 to 2017-12-31 Freq: M Data columns (total 2 columns): 0 12 non-null float64 1 12 non-null float64 dtypes: float64(2) # datetime64 DateTimeIndex I d data create new data Aggregate google = pd.read_csv('google.csv') # import pandas as pd pd.to_datetime() convert types. google.info() datetime64 RangeIndex: 504 entries, 0 to 503 Data columns (total 2 columns): date * 504 non-null object google.date = pd.to_datetime(google.date) google.info() price 504 non-null float64 dtypes: float64(1), object(1) RangeIndex: 504 entries, 0 to 503 google.head() Data columns (total 2 columns): date 504 non-null datetime64[ns] date price price 504 non-null float64 0 2015-01-02 524.81 dtypes: datetime64[ns](1), float64(1) 1 2015-01-05 513.87 2 2015-01-06 501.96 3 2015-01-07 501.10 4 2015-01-08 502.68.set_index() google.price.plot(title='Google Stock Price') plt.tight_layout(); plt.show() inplace google.set_index('date', inplace=True) google.info() DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30 Data columns (total 1 columns): price 504 non-null float64 dtypes: float64(1) index col google.loc['2016-6-1', 'price'] # Use full date with.loc[] - google['2015'].info() # Pass string for part of date all dates within this year 734.15 DatetimeIndex: 252 entries, 2015-01-02 to 2015-12-31 Data columns (total 1 columns): price 252 non-null float64 dtypes: float64(1) count end-date , different of google['2015-3': '2016-2'].info() # Slice includes last month > Inclusive intervals of pytha DatetimeIndex: 252 entries, 2015-03-02 to 2016-02-29 Data columns (total 1 columns): price 252 non-null float64 dtypes: float64(1) memory usage: 3.9 KB Upsampling i.asfreq('D') D DateTimeIndex - google.asfreq('D').info() # set calendar day frequency trep google.asfreq('D').head() Freq: D & DatetimeIndex: 729 entries, 2015-01-02 to 2016-12-30 more eat a price Data columns (total 1 columns): date price 504 non-null float64 2015-01-02 524.81 dtypes: float64(1) 2015-01-03 NaN 2015-01-04 NaN 2015-01-05 513.87 2015-01-06 501.96 business date but no stock price filter I.asfreq('B') google[google.price.isnull()] # Select missing 'price' values DateTimeIndex - google = google.asfreq('B') # Change to calendar day frequency price google.info() date 2015-01-19 NaN O DatetimeIndex: 521 entries, 2015-01-02 to 2016-12-30 2015-02-16 NaN Freq: B... Data columns (total 1 columns): price 504 non-null float64 2016-11-24 NaN dtypes: float64(1) 2016-12-26 NaN pd.read_csv() certain column as date parse google = pd.read_csv('google.csv', parse_dates=['date'], index_col='date') ↑ as list ↑ provide index rate mouth o google.info() pandas pd.DateTimeIndex DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30 Data columns (total 1 columns): price 504 non-null float64 dtypes: float64(1) -.shift() google.head() periods=1 I price - date google['shifted'] = google.price.shift() # default: periods=1 2015-01-02 524.81 google.head(3) 2015-01-05 513.87 2015-01-06 501.96 price shifted 2015-01-07 501.10 date First data Missig 2015-01-02 542.81 NaN 2015-01-08 502.68 2015-01-05 513.87 b 542.81 2015-01-06 501.96 d 513.87 for comparison.shift(periods=-1) xt xt−1 rate of change/ financiaturn move to past present/past google['change'] = google.price.div(google.shifted) ~ - - google[['price', 'shifted', 'change']].head(3) eX + chanherent google['lagged'] = google.price.shift(periods=-1) price shifted change google[['price', 'lagged', 'shifted']].tail(3) a susual Date relative 2017-01-03 786.14 NaN NaN b 786.14 price price lagged shifted - shift to future : 1st value 2017-01-04 786.90 ↓ 786.90 1.000967 atte for every date usually missing 2017-01-05 794.02 1.009048 2016-12-28 785.05 782.79 791.55 2016-12-29 2016-12-30 I 782.79 771.82 771.82 NaN 785.05 782.79 value is missing google['return'] = google.change.sub(1).mul(100) J -1 = google[['price', 'shifted', 'change', 'return']].head(3) xt − xt−1 price shifted change return google['diff'] = google.price.diff() Changea date google[['price', 'diff']].head(3) 2015-01-02 524.81 NaN NaN NaN 2015-01-05 513.87 524.81 0.98 -2.08 percentage price diff 2015-01-06 501.96 513.87 0.98 -2.32 date 2-4 + -1 / period return 2015-01-02 524.81 NaN 2015-01-05 513.87 -10.94 2015-01-06 501.96 -11.91 3 periods google['return_3d'] = google.price.pct_change(periods=3).mul(100) 0 - F - Xxt xt−1 google[['price', 'return_3d']].head() 3 period up > - google['pct_change'] = google.price.pct_change().mul(100) m price return_3d google[['price', 'return', 'pct_change']].head(3) date 2015-01-02 524.81 NaN 2015-01-05 513.87 NaN price return pct_change 157 val 2015-01-06 501.96 NaN se date 2015-01-07 501.10 -4.517825 2015-01-02 524.81 NaN NaN - > hull be 2015-01-08 502.68 -2.177594 2015-01-05 513.87 -2.08 -2.08 no Xt- 2015-01-06 501.96 -2.32 -2.32 - -

Use Quizgecko on...
Browser
Browser