R: Tools for FX High Frequency Data

HighFrequencyDataTools {fCalendar}

R Documentation

Tools for FX High Frequency Data

Description

A collection and description of functions for the management of high frequency financial market time series, especially for FX series collected from a Reuters data feed. The collection includes functions for the management of dates and times formatted in the ISO-8601 string 'CCYYMMDDhhmm', functions for filter and outlier detection of high frequency FX data records as collected from a Reuters data feed, and functions which can be used to calculate log-prices, log-returns, to extract subsamples, to interpolate in time, to build business time scales, and to de-seasonalize and de-volatilize high frequency financial market data.

'CCYYMMDDhhmm' Dates and Times functions are:

`xjulian`	Julian minute counts for 'CCYYMMDDhhmm' formats,
`xdate`	'CCYYMMDDhhmm' from Julian minute counts,
`xday.of.week`	day of week from 'CCYYMMDDhhmm' dates/times,
`xleap.year`	Decides if 'CCYYMMDDhhmm' is a leap year or not.

Filter and outlier detection functions are:

`fxdata.contributors`	Creates a table with contributor names,
`fxdata.parser`	Parses FX contributors and delay times,
`fxdata.filter`	Filters price and spread values from FX records,
`fxdata.varmin`	Aggregates records to variable minutes format.

Functions for De-Seasonalization and De-Volatilization:

`xts.log`	Calculates logarithms for xts time series values,
`xts.diff`	Differentiates xts time series values with lag=1,
`xts.cut`	Cuts a piece out of a xts time series,
`xts.interp`	Interpolates for equidistant time steps,
`xts.map`	Creates a volatility adjusted time-mapping,
`xts.upsilon`	Interpolates a time series in upsilon time,
`xts.dvs`	Creates a de-volatilizised time series,
`xts.dwh`	Plots intra-daily/weekly histograms.

Usage

xjulian(xdates, origin = 19600101)
xdate(xjulians, origin = 19600101)
xday.of.week(xdates)
xleap.year(xdates)

fxdata.contributors(x, include = 10)
fxdata.parser(x, parser.table)
fxdata.filter(x, parameter = "strong", doprint = TRUE)
fxdata.varmin(x, digits = 4)

xts.log(xts)
xts.diff(xts)
xts.cut(xts, from.date, to.date)
xts.interp(xts, deltat = 1, method = "constant")
xts.map(xts, mean.deltat, alpha) 
xts.upsilon(xts, weekly.map = seq(from = 59, by =60, length = 168), 
    method = "constant", doplot = TRUE, ...)
xts.dvs(xts, k, volatility, doplot = TRUE, ...) 
xts.dwh(xts, deltat = 60, period = "weekly", dolog = TRUE, 
    dodiff = TRUE, doplot = TRUE)

Arguments

`alpha`	the scaling exponent, a numeric value. For a random walk this will be 2.
`deltat`	the time in minutes between interpolated data points, by default 1 minute.
`digits`	an integer value, the number of digits for the `BID` and `ASK` price. By default five.
`dolog, dodiff`	two logicals. Should the logarithm of the input data be taken? Should the difference of the input data be taken? Note, if both `dolog` and `dodiff` are set to true, the input data are expected to be price values.
`doplot`	a logical. Should a plot be displayed?
`doprint`	a logical, should the filter parameters be printed?
`from.date, to.date`	ISO-8601 start and end dates, [CCYYMMDD].
`include`	the contributors are sorted by frequency, the `include` market makers are slected, an integer value. By default 10.)
`k`	sampling frequency, an integer value. Takes values on a sequence of the order of 10 data points.
`mean.deltat`	the average size of the time intervals in minutes, an integer value.
`method`	a character string naming the interpolation method, either "linear" or "constant".
`origin`	the origin date of the counter, in ISO-8601 date format, [CCYYMMDDhhmm]. By default January 1st, 1960.
`parameter`	a character string, either `strong` or `weak` denoting the filter parameter settings.
`parser.table`	the table of contributors produced by `fxdata.contributors`, a data.frame. In this table market leaders are marked.
`period`	a string, either "weekly", "daily" or "both" selecting the type of the histogram. By default "weekly".
`volatility`	average volatility, a numeric value. Takes values of the order of the variance of the time series data.
`weekly.map`	an integer vector of time intervals, by default 168 hourly intervals, spanning one week. Volatility based maps can be created by the function `xts.map`.
`x`	a 6 column standardized FX data frame with XDATE, DELAY, CONTRIBUTOR, BID, ASKL and FLAG fields.
`xdates`	a numeric vector of ISO-8601 formatted Gregorian dates/times, [CCYYMMDDhhmm].
`xjulians`	a numeric vector of Julian Minute Counts.
`xts`	a list with date/time `t` in ISO-8601 format, [CCYYMMDDhhmm], and data values `x`.
`...`	arguments to be passed.

Details

Date and Time Functions:

Note, that the x* indicates "extended" Date format including Time management functionality, whereas in sjulian, sdate, etc. the s* indicates "standard" or "simple" Date format, handling days, months years and centuries.

The Data Preprocessing Process:

fxdata.contributors creates a contributor list from a FX high frequency from a Reuters data feed and marks the market leaders. fxdata.parser selects with the information from the contributors list data records from market leaders. As input serves a standardized high frequency data file. Then the function fxdata.filter filters the FX data records and finally the function fxdata.varmin creates a "variable minutes" formatted data file, i.e. all data records within the same minute are averaged. The preprocessed data are the starting point for further investiuagtions.

The Standardized FX high frequency data file structure:

x is a standardized data frame with 6 columns. The first column gives the dates/time XDATE in ISO-8601 format [CCYYMMDDhhmm], the second column is a measure for the feed DELAY, the third column denotes the CONTRIBUTOR code, the fifth and six columns are the BID and ASK price, and the last column is an information FLAG, to add additional information.

The Contributor List:

The output of the fxdata.contributors function is used as input for the function fxdata.parser, which allows to extract the contributors marked as market makers in the output table.

The Parser:

The functions fxdata.parser parses the data. The parser table, parser.table, is a data frame with 4 columns: CONTRIBUTOR denotes a code naming the contributor, COUNTS gives the number of counts, how often the contributor appeared in the file, PERCENT the same as a percent value, SELECT denotes a logical valie, if TRUE the contributor belongs to the group of the market makers, otherwise not.

Variable Minutes Formatted Files:

The function fxdata.varmin creates data records within a variable minutes format.

Log Prices and Log Returns:

The function xts.log is mainly used to create log-prices from high frequency price records and the function xts.diff is used to create log-returns.

Subsamples:

Th function xts.cut is mainly used to create a subsample from data records. If the start and/or end date are out of the time range the time series is simply forward/backward extrapolated with the first and/or last value.

Interpolation:

The function xts.interp is used to interpolate data records. The method allows for two different kinds of interpolations, either "linear" for a linear interpolation or "constant" for a constant interpolation keeping the previous value in time (left value) within the interpolation region.

Business Time Maps:

The function xts.map is mainly used to create the time map required by the function xts.upsilon. Important: The argument xts must start on a Monday and end on a Sunday. Use xts.cut to guarantue this.

De-Seasonalization:

The function xts.upsilon is used to create data records with volatility adjusted time steps obtained from the "upsilon time" approach. These time steps can be taken from the time map crreated by the function xts.map. The data records are interpolated according to this time schedule.

De-Volatilization:

The de-volatilization algorithm is based on Zhou's approach. The algorithm used by the function xts.dvs reduces the sample frequency by keeping the variance of the price changes constant, therefore the name "de-volatilization". The procedure removes volatility by sampling data at different dates for different times. When the market is highly volatile more data are sampled. Equivalently, the time is stretched. When the market is less volatile, less data are sampled. Equivalently, the time is compressed. Although the resulting subsequence has unequally space calendar date/time intervals, it produces an almost equally volatile time series. This time series is called a de-volatilized time series, or "dv-Series".

Daily/Weekly Historgram Plots:

Financial market data exhibit seasonal structures over the day or week. This can be made explicit by daily or weekly histogram plots of the data using the function xts.dwh.

Value

Date and Time Functions:

xjulian
returns a numeric vector of Julian minute counts.

xdates
returns a numeric vector of ISO-8601 formatted dates/times, i.e. [CCYYMMDDhhmm].

xday.of.week
returns a numeric vector with entries between 0 (Sunday) and 6 (Saturday).

xleap.year
returns a logical vector with entries TRUE or FALSE, weather the date falls in a leap year or not.

Filter and Outlier Detection:

fxdata.contributors
returns a dataframe with the following columns: CONTRIBUTOR, the code naming the contributor, a character string; COUNTS, the counts, how often the contributor appeared in the file, an integer; PERCENT, the same in percent, a numeric value; SELECT, a logical. If TRUE the contributor belongs to the group of the n market makers, otherwise not.

fxdata.parser, fxdata.filter
return a data frame with the same structure as x, i.e. a standardized FX high frequencey data file structure.

fxdata.varmin
return a data frame with the same structure as x, i.e. a standardized FX high frequencey data file structure. The second column named DELAY is not used and set to zero for each data record. The third column CONTRIBUTOR is set to "MEAN", the method how the variable minute record was evaluated. The last column FLAG count the number of values from which the variable minute data record was evaluated.

De-seasonalization and de-volatilization:

All functions beside xts.map and xts.dwh return a list with the following two components: t, the date/time in ISO8601 format, [CCYYMMDDhhmm], the same as the input data xts$t, x, the logarithmic values of the input data records xts$x, a numeric vector.

xts.map
returns list with the following two components: xmap, a numeric vector with the time intervals, ymap, a numeric vector, the values to be mapped.

xts.dws
If daily was selected, a list with the following two components is returned: td, the daily histogram breaks, xd, the daily histogram freqencies.
If weekly was selected, a list with the following two components is returned: tw, the weekly histogram breaks, xw, the weekly histogram freqencies,
If both was selected, a list with all four components is returned.

Note

These functions were written originally for R Version 1.5. Only minor changes were made to make these functions available for Version 1.9. Date and time classes are outdated, but the functions are still working.

The file fdax97m.csv is too large and therefore not part of the fBasics distribution. Please contact inf@rmetrics.org.

Author(s)

Diethelm Wuertz for the Rmetrics R-port.

References

ISO-8601 (1988); Data Elements and Interchange Formats - Information Interchange, Representation of Dates and Time, International Organization for Standardization, Reference Number ISO 8601, 14 pages.

Zhou B. (1995); Forecasting Foreign Exchange Rates Subject to De-volatilization, in: Freedman R.S., Klein A.R., Lederman J. eds., Artificial Intelligence in the Capital Markets, Irwin Publishing, Chicago, p. 137–156.

Guillaume D.M., Dacorogna M.M., Dave R.R., Mueller U.A., Olsen R.B., Pictet O.V. (1997); From the bird's eye to the microscope: a survey of new stylized facts of the intra-daily foreign exchange markets, Finance and Stochastics 1, 95–129.

Examples

## SOURCE("fCalendar.99X-HighFrequencyData")

## xjulian - 
   xmpBasics("\nStart: Julian Counts > ")
   # Return the number of minute counts for the last day in every 
   # month for year 2000 beginning January 1st, 2001 at 16:00:
   xjulian(c(
     20000131, 20000229, 20000331, 20000430, 20000531, 20000630,
     20000731, 20000831, 20000930, 20001031, 20001130, 20001231)*10000 + 
     1600, origin = 20000101)
   # This doesn't work in S-Plus, you get a sequence of NA's,
   # use instead:
   xjulian(c(
     200001311600, 200002291600, 200003311600, 200004301600, 200005311600, 
     200006301600, 200007311600, 200008311600, 200009301600, 200010311600, 
     200011301600, 200012311600), origin = 20000101)
     
## xdate - 
   xmpBasics("\nNext: Convert Julian Counts to Dates > ")
   # Return the number of minute counts for th
   # Manage Date/Time in Extended Date/Time Format, ISO-8601
   # Date: 1973-01-01 15:30
   xjulian(197301011530)
   print(xdate(xjulian(197301011530)), digits = 9)
  
## xday.of.week -
   # Calculate the day of week for 1973-01-01 16:15
   xmpBasics("\nNext: Compute Day of Week > ")
   xday.of.week(197301011615)
        
## xleap.year -
   xmpBasics("\nNext: Check for Leap Years > ")
   # Falls Februar 1st, 2000 16:15 in a leap year?
   xleap.year(200002011615)    
   
## fxdata.contributors - 
   xmpBasics("\nStart: Filter Contributors > ")
   # Print contributor list:
   data(usdthb)
   usdthb[1:25, ]
   # Create contributor list:
   fxdata.contributors(usdthb, include = 5)
   
## fxdata.parser - 
   xmpBasics("\nNext: Parse Records > ")
   # Parse data:
   # Create a contributor list and mark the first 5 market makers:
   parser.table = fxdata.contributors(usdthb, include = 5)
   # Parse the market makers and print the first 25 entries:
   fxdata.parser(usdthb, parser.table)[1:25,]
   
## fxdata.filter - 
   xmpBasics("\nNext: Filter Records > ")
   # Filter data and plot unfiltered data:
   par(mfrow = c(2, 1))
   NumberOfRecords = length(usdthb[,1])
   NumberOfRecords
   plot(usdthb[,4], type = "l", 
        xlab = "Tick Number from Reuters THB=", 
        ylab = "100*log(Bid[n]/Bid[1])      Bid",
        ylim = c(-20,30), main="USDTHB June 1997 unfiltered")
   lines(x = c(1, NumberOfRecords), y = rep(usdthb[1, 4], 2), col = 4)
   lines(-100*log(usdthb[1, 4]/usdthb[, 4]))
   lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4)
   # Filter the data:
   usdthb = fxdata.filter(usdthb, parameter = "strong")
   # Quick And Dirty Time Scaling
   Records = length(usdthb$accepted[, 4])
   scale = NumberOfRecords/Records
   # Plot filtered data:
   plot(x=(1:Records)*scale, y = usdthb$accepted[, 4], type = "l", 
        xlab = "Tick Number from Reuters THB=", 
        ylab = "100*log(Bid[n]/Bid[1])      Bid", 
        ylim = c(-20, 30), main = "USDTHB June 1997 filtered")
   y = rep(usdthb$accepted[1, 4], 2)
   lines(x = c(1, NumberOfRecords), y = y, col = 4)
   y = -100*log(usdthb$accepted[1, 4]/usdthb$accepted[, 4])
   lines(x = (1:Records)*scale, y = y)
   lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4)
   
## fxdata.varmin - 
   xmpBasics("\nNext: Variable Minute Records > ")
   # Variable Minute Records from filter accepted Data,
   # create a varmin file and print the first 25 entries:
   fxdata.varmin(usdthb$accepted, digits = 5)[1:25, ]  
   
## xts.log - 
   xmpBasics("\nStart: Log Prices of FX Data > ")
   # Calculate log-prices from AUDUSD bid prices
   options(digits = 10)
   data(audusd)
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Print the first 25 entries:
   log.prices = xts.log(prices)
   data.frame(log.prices)[1:25, ]
   
## xts.diff - 
   xmpBasics("\nNext: Returns of FX Data > ")
   # Calculate one hourly AUDUSD log-returns
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Calculate the returns and print the first 25 entries:
   data.frame(xts.diff(xts.log(prices)))[1:25, ]
   
## xts.cut - 
   xmpBasics("\nNext: Cut out a Piece From a FX File > ")
   # Retrieve the AUDUSD bid quotes for October 21, 1997, 16:00 
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Retrieve prices and print the first 25 entries:
   data.frame(xts.cut(prices, from.date = 19971021, 
         to.date = 19971021))[1:25,]

## xts.interp - 
   xmpBasics("\nNext: Interpolate of FX Data > ")
   # Interpolate AUDUSD bid prices 
   # on a 15 minutes  time scale for October 21, 1997:
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Interpolate the prices and print the first 25 entries:
   data.frame(xts.interp(prices, deltat = 15))[1:25, ]
   
## xts.map - 
   xmpBasics("\nNext: Create Business Time Map > ")
   options(object.size = 5e8)
   par(mfrow = c(2, 1))
   # Load and plot prices:
   data(fdax9710)
   index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"])  
   # Start on Monday - end on Sunday, 3 weeks:
   index = xts.cut(index, from.date=19971006, to.date=19971026)
   plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time")   
   # Create hourly upsilon time map - start on Monday - end on Sunday:
   tmap = xts.map(index, mean.deltat = 60, alpha = 1.05)
   plot(x = tmap$xmap, y = tmap$ymap, ylim = c(0, max(tmap$x)), type="l", 
     main = "Time Mapping")   
   tmap 
   
## xts.upsilon -  
   xmpBasics("\nNext: De-seasonalize in Upsilon Time > ")
   index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"])  
   # Start on Monday - end on Sunday, 3 weeks:
   index = xts.cut(index, from.date = 19971006, to.date = 19971026)
   plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time")   
   # Create hourly upsilon time map - start on Monday - end on Sunday:
   tmap = xts.map(index, mean.deltat = 60, alpha = 1.05)
   # Extract data records according to time map:
   index.ups = xts.upsilon(index, weekly.map = tmap$ymap, 
     main="Prices in Upsilon time")
    
## xts.dvs - 
   xmpBasics("\nNext: De-volatilize Time Series > ")
   index = list(t=fdax9710[,"XDATE"], x=fdax9710[,"FDAX"])  
   # Start on Monday - end on Sunday, 3 weeks:
   index = xts.cut(index, from.date=19971006, to.date=19971026)
   plot(index$x, type = "l", ylab = "Prices", main = "Prices in event time")    
   # Devolatilize Time Series With dv-Series Algorithm:
   index.dvs = xts.dvs(index, k = 8, 
     volatility = 13.15*var(diff(log(index$x))), main = "Prices from dv-series") 

## Not run: 
## xts.dws -
   xmpBasics("\nNext: Plot daily/weekly Charts > ")
   # NOTE:
   # The file this-is-escaped-code{ is too large and therefore not part 
   # of  this distribution. Please contact \emph{inf@rmetrics.org}.
   data(fdax97m)
   xts = list(t = fdax97m[,"XDATE"], x = fdax97m[,"FDAX"])
   # Start on Monday - end on Sunday, 3 weeks:
   xts = xts.cut(index, from.date = 19970106, to.date = 19971228)
   # Create Daily and Weekly Histograms:
   result = xts.dwh (xts, period = "both", dolog = TRUE, 
     dodiff = TRUE, deltat = 30, doplot = TRUE)
## End(Not run)

[Package fCalendar version 221.10065 Index]