HighFrequencyDataTools {fCalendar}R Documentation

Tools for FX High Frequency Data

Description

A collection and description of functions for the management of high frequency financial market time series, especially for FX series collected from a Reuters data feed. The collection includes functions for the management of dates and times formatted in the ISO-8601 string 'CCYYMMDDhhmm', functions for filter and outlier detection of high frequency FX data records as collected from a Reuters data feed, and functions which can be used to calculate log-prices, log-returns, to extract subsamples, to interpolate in time, to build business time scales, and to de-seasonalize and de-volatilize high frequency financial market data.

'CCYYMMDDhhmm' Dates and Times functions are:

xjulian Julian minute counts for 'CCYYMMDDhhmm' formats,
xdate 'CCYYMMDDhhmm' from Julian minute counts,
xday.of.week day of week from 'CCYYMMDDhhmm' dates/times,
xleap.year Decides if 'CCYYMMDDhhmm' is a leap year or not.

Filter and outlier detection functions are:

fxdata.contributors Creates a table with contributor names,
fxdata.parser Parses FX contributors and delay times,
fxdata.filter Filters price and spread values from FX records,
fxdata.varmin Aggregates records to variable minutes format.

Functions for De-Seasonalization and De-Volatilization:

xts.log Calculates logarithms for xts time series values,
xts.diff Differentiates xts time series values with lag=1,
xts.cut Cuts a piece out of a xts time series,
xts.interp Interpolates for equidistant time steps,
xts.map Creates a volatility adjusted time-mapping,
xts.upsilon Interpolates a time series in upsilon time,
xts.dvs Creates a de-volatilizised time series,
xts.dwh Plots intra-daily/weekly histograms.

Usage

xjulian(xdates, origin = 19600101)
xdate(xjulians, origin = 19600101)
xday.of.week(xdates)
xleap.year(xdates)

fxdata.contributors(x, include = 10)
fxdata.parser(x, parser.table)
fxdata.filter(x, parameter = "strong", doprint = TRUE)
fxdata.varmin(x, digits = 4)

xts.log(xts)
xts.diff(xts)
xts.cut(xts, from.date, to.date)
xts.interp(xts, deltat = 1, method = "constant")
xts.map(xts, mean.deltat, alpha) 
xts.upsilon(xts, weekly.map = seq(from = 59, by =60, length = 168), 
    method = "constant", doplot = TRUE, ...)
xts.dvs(xts, k, volatility, doplot = TRUE, ...) 
xts.dwh(xts, deltat = 60, period = "weekly", dolog = TRUE, 
    dodiff = TRUE, doplot = TRUE) 

Arguments

alpha the scaling exponent, a numeric value. For a random walk this will be 2.
deltat the time in minutes between interpolated data points, by default 1 minute.
digits an integer value, the number of digits for the BID and ASK price. By default five.
dolog, dodiff two logicals. Should the logarithm of the input data be taken? Should the difference of the input data be taken? Note, if both dolog and dodiff are set to true, the input data are expected to be price values.
doplot a logical. Should a plot be displayed?
doprint a logical, should the filter parameters be printed?
from.date, to.date ISO-8601 start and end dates, [CCYYMMDD].
include the contributors are sorted by frequency, the include market makers are slected, an integer value. By default 10.)
k sampling frequency, an integer value. Takes values on a sequence of the order of 10 data points.
mean.deltat the average size of the time intervals in minutes, an integer value.
method a character string naming the interpolation method, either "linear" or "constant".
origin the origin date of the counter, in ISO-8601 date format, [CCYYMMDDhhmm]. By default January 1st, 1960.
parameter a character string, either strong or weak denoting the filter parameter settings.
parser.table the table of contributors produced by fxdata.contributors, a data.frame. In this table market leaders are marked.
period a string, either "weekly", "daily" or "both" selecting the type of the histogram. By default "weekly".
volatility average volatility, a numeric value. Takes values of the order of the variance of the time series data.
weekly.map an integer vector of time intervals, by default 168 hourly intervals, spanning one week. Volatility based maps can be created by the function xts.map.
x a 6 column standardized FX data frame with XDATE, DELAY, CONTRIBUTOR, BID, ASKL and FLAG fields.
xdates a numeric vector of ISO-8601 formatted Gregorian dates/times,
[CCYYMMDDhhmm].
xjulians a numeric vector of Julian Minute Counts.
xts a list with date/time t in ISO-8601 format, [CCYYMMDDhhmm], and data values x.
... arguments to be passed.

Details

Date and Time Functions:

Note, that the x* indicates "extended" Date format including Time management functionality, whereas in sjulian, sdate, etc. the s* indicates "standard" or "simple" Date format, handling days, months years and centuries.

The Data Preprocessing Process:

fxdata.contributors creates a contributor list from a FX high frequency from a Reuters data feed and marks the market leaders. fxdata.parser selects with the information from the contributors list data records from market leaders. As input serves a standardized high frequency data file. Then the function fxdata.filter filters the FX data records and finally the function fxdata.varmin creates a "variable minutes" formatted data file, i.e. all data records within the same minute are averaged. The preprocessed data are the starting point for further investiuagtions.

The Standardized FX high frequency data file structure:

x is a standardized data frame with 6 columns. The first column gives the dates/time XDATE in ISO-8601 format [CCYYMMDDhhmm], the second column is a measure for the feed DELAY, the third column denotes the CONTRIBUTOR code, the fifth and six columns are the BID and ASK price, and the last column is an information FLAG, to add additional information.

The Contributor List:

The output of the fxdata.contributors function is used as input for the function fxdata.parser, which allows to extract the contributors marked as market makers in the output table.

The Parser:

The functions fxdata.parser parses the data. The parser table, parser.table, is a data frame with 4 columns: CONTRIBUTOR denotes a code naming the contributor, COUNTS gives the number of counts, how often the contributor appeared in the file, PERCENT the same as a percent value, SELECT denotes a logical valie, if TRUE the contributor belongs to the group of the market makers, otherwise not.

Variable Minutes Formatted Files:

The function fxdata.varmin creates data records within a variable minutes format.

Log Prices and Log Returns:

The function xts.log is mainly used to create log-prices from high frequency price records and the function xts.diff is used to create log-returns.

Subsamples:

Th function xts.cut is mainly used to create a subsample from data records. If the start and/or end date are out of the time range the time series is simply forward/backward extrapolated with the first and/or last value.

Interpolation:

The function xts.interp is used to interpolate data records. The method allows for two different kinds of interpolations, either "linear" for a linear interpolation or "constant" for a constant interpolation keeping the previous value in time (left value) within the interpolation region.

Business Time Maps:

The function xts.map is mainly used to create the time map required by the function xts.upsilon. Important: The argument xts must start on a Monday and end on a Sunday. Use xts.cut to guarantue this.

De-Seasonalization:

The function xts.upsilon is used to create data records with volatility adjusted time steps obtained from the "upsilon time" approach. These time steps can be taken from the time map crreated by the function xts.map. The data records are interpolated according to this time schedule.

De-Volatilization:

The de-volatilization algorithm is based on Zhou's approach. The algorithm used by the function xts.dvs reduces the sample frequency by keeping the variance of the price changes constant, therefore the name "de-volatilization". The procedure removes volatility by sampling data at different dates for different times. When the market is highly volatile more data are sampled. Equivalently, the time is stretched. When the market is less volatile, less data are sampled. Equivalently, the time is compressed. Although the resulting subsequence has unequally space calendar date/time intervals, it produces an almost equally volatile time series. This time series is called a de-volatilized time series, or "dv-Series".

Daily/Weekly Historgram Plots:

Financial market data exhibit seasonal structures over the day or week. This can be made explicit by daily or weekly histogram plots of the data using the function xts.dwh.

Value

Date and Time Functions:

xjulian
returns a numeric vector of Julian minute counts.

xdates
returns a numeric vector of ISO-8601 formatted dates/times, i.e. [CCYYMMDDhhmm].

xday.of.week
returns a numeric vector with entries between 0 (Sunday) and 6 (Saturday).

xleap.year
returns a logical vector with entries TRUE or FALSE, weather the date falls in a leap year or not.

Filter and Outlier Detection:

fxdata.contributors
returns a dataframe with the following columns: CONTRIBUTOR, the code naming the contributor, a character string; COUNTS, the counts, how often the contributor appeared in the file, an integer; PERCENT, the same in percent, a numeric value; SELECT, a logical. If TRUE the contributor belongs to the group of the n market makers, otherwise not.

fxdata.parser, fxdata.filter
return a data frame with the same structure as x, i.e. a standardized FX high frequencey data file structure.

fxdata.varmin
return a data frame with the same structure as x, i.e. a standardized FX high frequencey data file structure. The second column named DELAY is not used and set to zero for each data record. The third column CONTRIBUTOR is set to "MEAN", the method how the variable minute record was evaluated. The last column FLAG count the number of values from which the variable minute data record was evaluated.

De-seasonalization and de-volatilization:

All functions beside xts.map and xts.dwh return a list with the following two components: t, the date/time in ISO8601 format, [CCYYMMDDhhmm], the same as the input data xts$t, x, the logarithmic values of the input data records xts$x, a numeric vector.

xts.map
returns list with the following two components: xmap, a numeric vector with the time intervals, ymap, a numeric vector, the values to be mapped.

xts.dws
If daily was selected, a list with the following two components is returned: td, the daily histogram breaks, xd, the daily histogram freqencies.
If weekly was selected, a list with the following two components is returned: tw, the weekly histogram breaks, xw, the weekly histogram freqencies,
If both was selected, a list with all four components is returned.

Note

These functions were written originally for R Version 1.5. Only minor changes were made to make these functions available for Version 1.9. Date and time classes are outdated, but the functions are still working.

The file fdax97m.csv is too large and therefore not part of the fBasics distribution. Please contact inf@rmetrics.org.

Author(s)

Diethelm Wuertz for the Rmetrics R-port.

References

ISO-8601 (1988); Data Elements and Interchange Formats - Information Interchange, Representation of Dates and Time, International Organization for Standardization, Reference Number ISO 8601, 14 pages.

Zhou B. (1995); Forecasting Foreign Exchange Rates Subject to De-volatilization, in: Freedman R.S., Klein A.R., Lederman J. eds., Artificial Intelligence in the Capital Markets, Irwin Publishing, Chicago, p. 137–156.

Guillaume D.M., Dacorogna M.M., Dave R.R., Mueller U.A., Olsen R.B., Pictet O.V. (1997); From the bird's eye to the microscope: a survey of new stylized facts of the intra-daily foreign exchange markets, Finance and Stochastics 1, 95–129.

Examples

## SOURCE("fCalendar.99X-HighFrequencyData")

## xjulian - 
   xmpBasics("\nStart: Julian Counts > ")
   # Return the number of minute counts for the last day in every 
   # month for year 2000 beginning January 1st, 2001 at 16:00:
   xjulian(c(
     20000131, 20000229, 20000331, 20000430, 20000531, 20000630,
     20000731, 20000831, 20000930, 20001031, 20001130, 20001231)*10000 + 
     1600, origin = 20000101)
   # This doesn't work in S-Plus, you get a sequence of NA's,
   # use instead:
   xjulian(c(
     200001311600, 200002291600, 200003311600, 200004301600, 200005311600, 
     200006301600, 200007311600, 200008311600, 200009301600, 200010311600, 
     200011301600, 200012311600), origin = 20000101)
     
## xdate - 
   xmpBasics("\nNext: Convert Julian Counts to Dates > ")
   # Return the number of minute counts for th
   # Manage Date/Time in Extended Date/Time Format, ISO-8601
   # Date: 1973-01-01 15:30
   xjulian(197301011530)
   print(xdate(xjulian(197301011530)), digits = 9)
  
## xday.of.week -
   # Calculate the day of week for 1973-01-01 16:15
   xmpBasics("\nNext: Compute Day of Week > ")
   xday.of.week(197301011615)
        
## xleap.year -
   xmpBasics("\nNext: Check for Leap Years > ")
   # Falls Februar 1st, 2000 16:15 in a leap year?
   xleap.year(200002011615)    
   
## fxdata.contributors - 
   xmpBasics("\nStart: Filter Contributors > ")
   # Print contributor list:
   data(usdthb)
   usdthb[1:25, ]
   # Create contributor list:
   fxdata.contributors(usdthb, include = 5)
   
## fxdata.parser - 
   xmpBasics("\nNext: Parse Records > ")
   # Parse data:
   # Create a contributor list and mark the first 5 market makers:
   parser.table = fxdata.contributors(usdthb, include = 5)
   # Parse the market makers and print the first 25 entries:
   fxdata.parser(usdthb, parser.table)[1:25,]
   
## fxdata.filter - 
   xmpBasics("\nNext: Filter Records > ")
   # Filter data and plot unfiltered data:
   par(mfrow = c(2, 1))
   NumberOfRecords = length(usdthb[,1])
   NumberOfRecords
   plot(usdthb[,4], type = "l", 
        xlab = "Tick Number from Reuters THB=", 
        ylab = "100*log(Bid[n]/Bid[1])      Bid",
        ylim = c(-20,30), main="USDTHB June 1997 unfiltered")
   lines(x = c(1, NumberOfRecords), y = rep(usdthb[1, 4], 2), col = 4)
   lines(-100*log(usdthb[1, 4]/usdthb[, 4]))
   lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4)
   # Filter the data:
   usdthb = fxdata.filter(usdthb, parameter = "strong")
   # Quick And Dirty Time Scaling
   Records = length(usdthb$accepted[, 4])
   scale = NumberOfRecords/Records
   # Plot filtered data:
   plot(x=(1:Records)*scale, y = usdthb$accepted[, 4], type = "l", 
        xlab = "Tick Number from Reuters THB=", 
        ylab = "100*log(Bid[n]/Bid[1])      Bid", 
        ylim = c(-20, 30), main = "USDTHB June 1997 filtered")
   y = rep(usdthb$accepted[1, 4], 2)
   lines(x = c(1, NumberOfRecords), y = y, col = 4)
   y = -100*log(usdthb$accepted[1, 4]/usdthb$accepted[, 4])
   lines(x = (1:Records)*scale, y = y)
   lines(x = c(1, NumberOfRecords), y = c(0, 0), col = 4)
   
## fxdata.varmin - 
   xmpBasics("\nNext: Variable Minute Records > ")
   # Variable Minute Records from filter accepted Data,
   # create a varmin file and print the first 25 entries:
   fxdata.varmin(usdthb$accepted, digits = 5)[1:25, ]  
   
## xts.log - 
   xmpBasics("\nStart: Log Prices of FX Data > ")
   # Calculate log-prices from AUDUSD bid prices
   options(digits = 10)
   data(audusd)
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Print the first 25 entries:
   log.prices = xts.log(prices)
   data.frame(log.prices)[1:25, ]
   
## xts.diff - 
   xmpBasics("\nNext: Returns of FX Data > ")
   # Calculate one hourly AUDUSD log-returns
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Calculate the returns and print the first 25 entries:
   data.frame(xts.diff(xts.log(prices)))[1:25, ]
   
## xts.cut - 
   xmpBasics("\nNext: Cut out a Piece From a FX File > ")
   # Retrieve the AUDUSD bid quotes for October 21, 1997, 16:00 
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Retrieve prices and print the first 25 entries:
   data.frame(xts.cut(prices, from.date = 19971021, 
         to.date = 19971021))[1:25,]

## xts.interp - 
   xmpBasics("\nNext: Interpolate of FX Data > ")
   # Interpolate AUDUSD bid prices 
   # on a 15 minutes  time scale for October 21, 1997:
   prices = list(t = audusd[,"XDATE"], x = audusd[,"BID"])
   # Interpolate the prices and print the first 25 entries:
   data.frame(xts.interp(prices, deltat = 15))[1:25, ]
   
## xts.map - 
   xmpBasics("\nNext: Create Business Time Map > ")
   options(object.size = 5e8)
   par(mfrow = c(2, 1))
   # Load and plot prices:
   data(fdax9710)
   index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"])  
   # Start on Monday - end on Sunday, 3 weeks:
   index = xts.cut(index, from.date=19971006, to.date=19971026)
   plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time")   
   # Create hourly upsilon time map - start on Monday - end on Sunday:
   tmap = xts.map(index, mean.deltat = 60, alpha = 1.05)
   plot(x = tmap$xmap, y = tmap$ymap, ylim = c(0, max(tmap$x)), type="l", 
     main = "Time Mapping")   
   tmap 
   
## xts.upsilon -  
   xmpBasics("\nNext: De-seasonalize in Upsilon Time > ")
   index = list(t = fdax9710[,"XDATE"], x = fdax9710[,"FDAX"])  
   # Start on Monday - end on Sunday, 3 weeks:
   index = xts.cut(index, from.date = 19971006, to.date = 19971026)
   plot(index$x, type = "l", xlab = "Prices", main = "Prices in event time")   
   # Create hourly upsilon time map - start on Monday - end on Sunday:
   tmap = xts.map(index, mean.deltat = 60, alpha = 1.05)
   # Extract data records according to time map:
   index.ups = xts.upsilon(index, weekly.map = tmap$ymap, 
     main="Prices in Upsilon time")
    
## xts.dvs - 
   xmpBasics("\nNext: De-volatilize Time Series > ")
   index = list(t=fdax9710[,"XDATE"], x=fdax9710[,"FDAX"])  
   # Start on Monday - end on Sunday, 3 weeks:
   index = xts.cut(index, from.date=19971006, to.date=19971026)
   plot(index$x, type = "l", ylab = "Prices", main = "Prices in event time")    
   # Devolatilize Time Series With dv-Series Algorithm:
   index.dvs = xts.dvs(index, k = 8, 
     volatility = 13.15*var(diff(log(index$x))), main = "Prices from dv-series") 

## Not run: 
## xts.dws -
   xmpBasics("\nNext: Plot daily/weekly Charts > ")
   # NOTE:
   # The file this-is-escaped-code{ is too large and therefore not part 
   # of  this distribution. Please contact \emph{inf@rmetrics.org}.
   data(fdax97m)
   xts = list(t = fdax97m[,"XDATE"], x = fdax97m[,"FDAX"])
   # Start on Monday - end on Sunday, 3 weeks:
   xts = xts.cut(index, from.date = 19970106, to.date = 19971228)
   # Create Daily and Weekly Histograms:
   result = xts.dwh (xts, period = "both", dolog = TRUE, 
     dodiff = TRUE, deltat = 30, doplot = TRUE)
## End(Not run)      

[Package fCalendar version 221.10065 Index]