HighFrequencyDataTools {fCalendar} | R Documentation |
A collection and description of functions for the
management of high frequency financial market time
series, especially for FX series collected from a
Reuters data feed. The collection includes functions
for the management of dates and times formatted in
the ISO-8601 string 'CCYYMMDDhhmm', functions for
filter and outlier detection of high frequency FX
data records as collected from a Reuters data feed,
and functions which can be used to calculate log-prices,
log-returns, to extract subsamples, to interpolate
in time, to build business time scales, and to
de-seasonalize and de-volatilize high frequency
financial market data.
'CCYYMMDDhhmm' Dates and Times functions are:
xjulian | Julian minute counts for 'CCYYMMDDhhmm' formats, |
xdate | 'CCYYMMDDhhmm' from Julian minute counts, |
xday.of.week | day of week from 'CCYYMMDDhhmm' dates/times, |
xleap.year | Decides if 'CCYYMMDDhhmm' is a leap year or not. |
Filter and outlier detection functions are:
fxdata.contributors | Creates a table with contributor names, |
fxdata.parser | Parses FX contributors and delay times, |
fxdata.filter | Filters price and spread values from FX records, |
fxdata.varmin | Aggregates records to variable minutes format. |
Functions for De-Seasonalization and De-Volatilization:
xts.log | Calculates logarithms for xts time series values, |
xts.diff | Differentiates xts time series values with lag=1, |
xts.cut | Cuts a piece out of a xts time series, |
xts.interp | Interpolates for equidistant time steps, |
xts.map | Creates a volatility adjusted time-mapping, |
xts.upsilon | Interpolates a time series in upsilon time, |
xts.dvs | Creates a de-volatilizised time series, |
xts.dwh | Plots intra-daily/weekly histograms. |
xjulian(xdates, origin = 19600101) xdate(xjulians, origin = 19600101) xday.of.week(xdates) xleap.year(xdates) fxdata.contributors(x, include = 10) fxdata.parser(x, parser.table) fxdata.filter(x, parameter = "strong", doprint = TRUE) fxdata.varmin(x, digits = 4) xts.log(xts) xts.diff(xts) xts.cut(xts, from.date, to.date) xts.interp(xts, deltat = 1, method = "constant") xts.map(xts, mean.deltat, alpha) xts.upsilon(xts, weekly.map = seq(from = 59, by =60, length = 168), method = "constant", doplot = TRUE, ...) xts.dvs(xts, k, volatility, doplot = TRUE, ...) xts.dwh(xts, deltat = 60, period = "weekly", dolog = TRUE, dodiff = TRUE, doplot = TRUE)
alpha |
the scaling exponent, a numeric value. For a random walk this will be 2. |
deltat |
the time in minutes between interpolated data points, by default 1 minute. |
digits |
an integer value, the number of digits for the
BID and ASK price. By default five.
|
dolog, dodiff |
two logicals. Should the logarithm of the input data be taken?
Should the difference of the input data be taken?
Note, if both dolog and dodiff are set to true,
the input data are expected to be price values.
|
doplot |
a logical. Should a plot be displayed? |
doprint |
a logical, should the filter parameters be printed? |
from.date, to.date |
ISO-8601 start and end dates, [CCYYMMDD]. |
include |
the contributors are sorted by frequency, the include
market makers are slected, an integer value. By default 10.)
|
k |
sampling frequency, an integer value. Takes values on a sequence of the order of 10 data points. |
mean.deltat |
the average size of the time intervals in minutes, an integer value. |
method |
a character string naming the interpolation method, either "linear" or "constant". |
origin |
the origin date of the counter, in ISO-8601 date format, [CCYYMMDDhhmm]. By default January 1st, 1960. |
parameter |
a character string, either strong or weak
denoting the filter parameter settings.
|
parser.table |
the table of contributors produced by fxdata.contributors ,
a data.frame. In this table market leaders are marked.
|
period |
a string, either "weekly", "daily" or "both" selecting the type of the histogram. By default "weekly". |
volatility |
average volatility, a numeric value. Takes values of the order of the variance of the time series data. |
weekly.map |
an integer vector of time intervals, by default 168
hourly intervals, spanning one week. Volatility
based maps can be created by the function xts.map .
|
x |
a 6 column standardized FX data frame with XDATE, DELAY, CONTRIBUTOR, BID, ASKL and FLAG fields. |
xdates |
a numeric vector of ISO-8601 formatted Gregorian dates/times, [CCYYMMDDhhmm]. |
xjulians |
a numeric vector of Julian Minute Counts. |
xts |
a list with date/time t in ISO-8601
format, [CCYYMMDDhhmm], and data values x .
|
... |
arguments to be passed. |
Date and Time Functions:
Note, that the x*
indicates "extended" Date format including
Time management functionality, whereas in sjulian
, sdate
,
etc. the s*
indicates "standard" or "simple" Date format,
handling days, months years and centuries.
The Data Preprocessing Process:
fxdata.contributors
creates a contributor list from a
FX high frequency from a Reuters data feed and marks the market
leaders. fxdata.parser
selects with the information from
the contributors list data records from market leaders. As input
serves a standardized high frequency data file. Then the function
fxdata.filter
filters the FX data records and finally
the function fxdata.varmin
creates a "variable
minutes" formatted data file, i.e. all data records within the same
minute are averaged. The preprocessed data are the starting point
for further investiuagtions.
The Standardized FX high frequency data file structure:
x
is a standardized data frame with 6 columns. The first
column gives the dates/time XDATE
in ISO-8601 format
[CCYYMMDDhhmm], the second column is a measure for the feed
DELAY
, the third column denotes the CONTRIBUTOR
code, the fifth and six columns are the BID
and ASK
price, and the last column is an information FLAG
, to
add additional information.
The Contributor List:
The output of the fxdata.contributors
function is used
as input for the function
fxdata.parser
, which allows to extract the contributors
marked as market makers in the output table.
The Parser:
The functions fxdata.parser
parses the data.
The parser table, parser.table
, is a data frame with 4
columns: CONTRIBUTOR
denotes a code naming the contributor,
COUNTS
gives the number of counts, how often the contributor
appeared in the file, PERCENT
the same as a percent value,
SELECT
denotes a logical valie, if TRUE the contributor
belongs to the group of the market makers, otherwise not.
Variable Minutes Formatted Files:
The function fxdata.varmin
creates data records within a variable minutes format.
Log Prices and Log Returns:
The function xts.log
is mainly used to create log-prices from
high frequency price records and the function xts.diff
is used to create log-returns.
Subsamples:
Th function xts.cut
is mainly used to create a subsample from
data records. If the start and/or end date are out of the time range
the time series is simply forward/backward extrapolated with the first
and/or last value.
Interpolation:
The function xts.interp
is used to interpolate data records.
The method allows for two different kinds of interpolations, either
"linear"
for a linear interpolation or "constant"
for
a constant interpolation keeping the previous value in time (left
value) within the interpolation region.
Business Time Maps:
The function xts.map
is mainly used to create the time map
required by the
function xts.upsilon
. Important: The argument xts
must
start on a Monday and end on a Sunday. Use xts.cut
to guarantue
this.
De-Seasonalization:
The function xts.upsilon
is used to create data records with
volatility
adjusted time steps obtained from the "upsilon time" approach. These
time steps can be taken from the time map crreated by the function
xts.map
. The data records are interpolated according to this
time schedule.
De-Volatilization:
The de-volatilization algorithm is based on Zhou's approach. The
algorithm used by the function xts.dvs
reduces the sample
frequency by keeping the variance of
the price changes constant, therefore the name "de-volatilization".
The procedure removes volatility by sampling data at different dates
for different times. When the market is highly volatile more data are
sampled. Equivalently, the time is stretched. When the market is
less volatile, less data are sampled. Equivalently, the time is
compressed. Although the resulting subsequence has unequally space
calendar date/time intervals, it produces an almost equally volatile
time series. This time series is called a de-volatilized time
series, or "dv-Series".
Daily/Weekly Historgram Plots:
Financial market data exhibit seasonal structures over the day or
week. This can be made explicit by daily or weekly histogram plots
of the data using the function xts.dwh
.
Date and Time Functions:
xjulian
returns a numeric vector of Julian minute counts.
xdates
returns a numeric vector of ISO-8601 formatted dates/times, i.e.
[CCYYMMDDhhmm].
xday.of.week
returns a numeric vector with entries between 0
(Sunday) and
6
(Saturday).
xleap.year
returns a logical vector with entries TRUE or FALSE, weather the date
falls in a leap year or not.
Filter and Outlier Detection:
fxdata.contributors
returns a dataframe with the following columns: CONTRIBUTOR
,
the code naming the contributor, a character string; COUNTS
,
the counts, how often the contributor appeared in the file, an integer;
PERCENT
, the same in percent, a numeric value; SELECT
,
a logical. If TRUE the contributor belongs to the group of the
n
market makers, otherwise not.
fxdata.parser
, fxdata.filter
return a data frame with the same structure as x
, i.e. a
standardized FX high frequencey data file structure.
fxdata.varmin
return a data frame with the same structure as x
, i.e. a
standardized FX high frequencey data file structure. The second column
named DELAY
is not used and set to zero for each data record. The
third column CONTRIBUTOR
is set to "MEAN", the method how the
variable minute record was evaluated. The last column FLAG
count the number of values from which the variable minute data
record was evaluated.
De-seasonalization and de-volatilization:
All functions beside xts.map
and xts.dwh
return a
list with the following two components:
t
, the date/time in ISO8601 format, [CCYYMMDDhhmm], the same
as the input data xts$t
,
x
, the logarithmic values of the input data records xts$x
,
a numeric vector.
xts.map
returns list with the following two components:
xmap
, a numeric vector with the time intervals,
ymap
, a numeric vector, the values to be mapped.
xts.dws
If daily
was selected, a list with the following two
components is returned:
td
, the daily histogram breaks,
xd
, the daily histogram freqencies.
If weekly
was selected, a list with the following two
components is returned:
tw
, the weekly histogram breaks,
xw
, the weekly histogram freqencies,
If both
was selected, a list with all four components
is returned.
These functions were written originally for R Version 1.5. Only minor changes were made to make these functions available for Version 1.9. Date and time classes are outdated, but the functions are still working.
The file fdax97m.csv
is too large and therefore not part of
the fBasics
distribution. Please contact inf@rmetrics.org.
Diethelm Wuertz for the Rmetrics R-port.
ISO-8601 (1988); Data Elements and Interchange Formats - Information Interchange, Representation of Dates and Time, International Organization for Standardization, Reference Number ISO 8601, 14 pages.
Zhou B. (1995); Forecasting Foreign Exchange Rates Subject to De-volatilization, in: Freedman R.S., Klein A.R., Lederman J. eds., Artificial Intelligence in the Capital Markets, Irwin Publishing, Chicago, p. 137–156.
Guillaume D.M., Dacorogna M.M., Dave R.R., Mueller U.A., Olsen R.B., Pictet O.V. (1997); From the bird's eye to the microscope: a survey of new stylized facts of the intra-daily foreign exchange markets, Finance and Stochastics 1, 95–129.
## SOURCE("fCalendar.99X-HighFrequencyData") ## xjulian - xmpBasics("\nStart: Julian Counts > ") # Return the number of minute counts for the last day in every # month for year 2000 beginning January 1st, 2001 at 16:00: xjulian(c( 20000131, 20000229, 20000331, 20000430, 20000531, 20000630, 20000731, 20000831, 20000930, 20001031, 20001130, 20001231)*10000 + 1600, origin = 20000101) # This doesn't work in S-Plus, you get a sequence of NA's, # use instead: xjulian(c( 200001311600, 200002291600, 200003311600, 200004301600, 200005311600, 200006301600, 200007311600, 200008311600, 200009301600, 200010311600, 200011301600, 200012311600), origin = 20000101) ## xdate - xmpBasics("\nNext: Convert Julian Counts to Dates > ") # Return the number of minute counts for th # Manage Date/Time in Extended Date/Time Format, ISO-8601 # Date: 1973-01-01 15:30 xjulian(197301011530) print(xdate(xjulian(197301011530)), digits = 9) ## xday.of.week - # Calculate the day of week for 1973-01-01 16:15 xmpBasics("\nNext: Compute Day of Week > ") xday.of.week(197301011615) ## xleap.year - xmpBasics("\nNext: Check for Leap Years > ") # Falls Februar 1st, 2000 16:15 in a leap year? xleap.year(200002011615) ## fxdata.contributors - xmpBasics("\nStart: Filter Contributors > ") # Print contributor list: data(usdthb) usdthb[1:25, ] # Create contributor list: fxdata.contributors(usdthb, include = 5) ## fxdata.parser - xmpBasics("\nNext: Parse Records > ") # Parse data: # Create a contributor list and mark the first 5 market makers: parser.table = fxdata.contributors(usdthb, include = 5) # Parse the market makers and print the first 25 entries: fxdata.parser(usdthb, parser.table)[1:25,] ## fxdata.filter - xmpBasics("\nNext: Filter Records > ") # Filter data and plot unfiltered data: par(mfrow = c(2, 1)) NumberOfRecords = length(usdthb[,1]) NumberOfRecords plot(usdthb[,4], type = "l", xlab = "Tick Number from Reuters THB=", ylab = "100*log(Bid[n]/Bid[1]) Bid", ylim = c(-20,30), main="USDTHB June 1997 unfiltered") lines(x = c(1, NumberOfRecords), y = rep(usdthb[1, 4], 2), col = 4) lines(-100*log(usdthb[1, 4]/usdthb[, 4])) lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4) # Filter the data: usdthb = fxdata.filter(usdthb, parameter = "strong") # Quick And Dirty Time Scaling Records = length(usdthb$accepted[, 4]) scale = NumberOfRecords/Records # Plot filtered data: plot(x=(1:Records)*scale, y = usdthb$accepted[, 4], type = "l", xlab = "Tick Number from Reuters THB=", ylab = "100*log(Bid[n]/Bid[1]) Bid", ylim = c(-20, 30), main = "USDTHB June 1997 filtered") y = rep(usdthb$accepted[1, 4], 2) lines(x = c(1, NumberOfRecords), y = y, col = 4) y = -100*log(usdthb$accepted[1, 4]/usdthb$accepted[, 4]) lines(x = (1:Records)*scale, y = y) lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4) ## fxdata.varmin - xmpBasics("\nNext: Variable Minute Records > ") # Variable Minute Records from filter accepted Data, # create a varmin file and print the first 25 entries: fxdata.varmin(usdthb$accepted, digits = 5)[1:25, ] ## xts.log - xmpBasics("\nStart: Log Prices of FX Data > ") # Calculate log-prices from AUDUSD bid prices options(digits = 10) data(audusd) prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"]) # Print the first 25 entries: log.prices = xts.log(prices) data.frame(log.prices)[1:25, ] ## xts.diff - xmpBasics("\nNext: Returns of FX Data > ") # Calculate one hourly AUDUSD log-returns prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"]) # Calculate the returns and print the first 25 entries: data.frame(xts.diff(xts.log(prices)))[1:25, ] ## xts.cut - xmpBasics("\nNext: Cut out a Piece From a FX File > ") # Retrieve the AUDUSD bid quotes for October 21, 1997, 16:00 prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"]) # Retrieve prices and print the first 25 entries: data.frame(xts.cut(prices, from.date = 19971021, to.date = 19971021))[1:25,] ## xts.interp - xmpBasics("\nNext: Interpolate of FX Data > ") # Interpolate AUDUSD bid prices # on a 15 minutes time scale for October 21, 1997: prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"]) # Interpolate the prices and print the first 25 entries: data.frame(xts.interp(prices, deltat = 15))[1:25, ] ## xts.map - xmpBasics("\nNext: Create Business Time Map > ") options(object.size = 5e8) par(mfrow = c(2, 1)) # Load and plot prices: data(fdax9710) index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"]) # Start on Monday - end on Sunday, 3 weeks: index = xts.cut(index, from.date=19971006, to.date=19971026) plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time") # Create hourly upsilon time map - start on Monday - end on Sunday: tmap = xts.map(index, mean.deltat = 60, alpha = 1.05) plot(x = tmap$xmap, y = tmap$ymap, ylim = c(0, max(tmap$x)), type="l", main = "Time Mapping") tmap ## xts.upsilon - xmpBasics("\nNext: De-seasonalize in Upsilon Time > ") index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"]) # Start on Monday - end on Sunday, 3 weeks: index = xts.cut(index, from.date = 19971006, to.date = 19971026) plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time") # Create hourly upsilon time map - start on Monday - end on Sunday: tmap = xts.map(index, mean.deltat = 60, alpha = 1.05) # Extract data records according to time map: index.ups = xts.upsilon(index, weekly.map = tmap$ymap, main="Prices in Upsilon time") ## xts.dvs - xmpBasics("\nNext: De-volatilize Time Series > ") index = list(t=fdax9710[,"XDATE"], x=fdax9710[,"FDAX"]) # Start on Monday - end on Sunday, 3 weeks: index = xts.cut(index, from.date=19971006, to.date=19971026) plot(index$x, type = "l", ylab = "Prices", main = "Prices in event time") # Devolatilize Time Series With dv-Series Algorithm: index.dvs = xts.dvs(index, k = 8, volatility = 13.15*var(diff(log(index$x))), main = "Prices from dv-series") ## Not run: ## xts.dws - xmpBasics("\nNext: Plot daily/weekly Charts > ") # NOTE: # The file this-is-escaped-code{ is too large and therefore not part # of this distribution. Please contact \emph{inf@rmetrics.org}. data(fdax97m) xts = list(t = fdax97m[,"XDATE"], x = fdax97m[,"FDAX"]) # Start on Monday - end on Sunday, 3 weeks: xts = xts.cut(index, from.date = 19970106, to.date = 19971228) # Create Daily and Weekly Histograms: result = xts.dwh (xts, period = "both", dolog = TRUE, dodiff = TRUE, deltat = 30, doplot = TRUE) ## End(Not run)