The dataMineLib provides the application with an API to create, manipulate and destroy tables. Use the dataMineLib.createTable function to create a table. Once a table has been created you will most often use a table cursor to access its contents. Table cursors come in three flavors; memory, disk and static. Create a cursor using the dataMineLib.open function.
Memory cursors open a table and read its entire contents into memory. Static cursors provide buffered read only access to a table. Disk cursors provide buffered read and unbuffered write access to a table. You can open multiple concurrent cursors on the same table with the restriction that only one disk cursor may be open on that table. Note that disk and static cursors opened on the same table share the same buffer.
Cursors provide you the ability to view subsets of the information in the table by running filtering operations against the cursor. Filtering operations are applied to the current view and result in a new current view. Filtering does not affect the underlying table data and the current view can be restored to match the underlying table data with the cursor.restore or cursor.reset functions. Use the cursor.saveView function to save the current view, by name, into the table's view directory. Use the cursor.restoreView function to make a saved view the current view. Note that the view directory is saved in the object repository as part of the collection of table objects reachable through the table schema object. Note that the current view is contained in the cursors rowVector property and you will see reference to rowVector in this document. The bckVector contains the unfiltered content of the cursor's underlying table. The cursor.reset function mentioned above essentially create a rowVector that references all of the rows in bckVector in bckVector order to restore an unfiltered view of the table.
This implementation of tables has been optimized for datamining operations where bulk load and read/filtering access performance is given priority over other possible table operations. For this reason, individual row writes and row deletes are expensive and have significant side affects. For instance, a row delete or row insert will have the following side effects:
All cursors share the same basic API. However, not all functions are available on each cursor type and not all operations return exactly the same type of result. In addition, some cursor function options are not available for all cursor types. These differences are documented in the function documentation.
Use the dataMineLib.open function to create a memory cursor object. The memory cursor object loads the entire contents of a specified table into memory. This allows very fast manipulation of the table at the expense of memory usage. The following example opens a memory cursor on a table named myTable. myTable must already exist for a cursor to be opened on it.
(setq myCursor (dataMineLib.open myTable: memory:))
When you filter a memory cursor, a copy of the table content is made in memory and then the undesired rows are removed from the copy to generate the current view. Use cursor.viewSave to save the current view in the cursors viewDirectory. Note that each view saved in the view directory contains the complete content of the rows contained in the view. Take care to account for the memory usage incurred by view creation in a memory cursor.
Use the cursor.save function to save the current contents of a memory cursor to disk, overwriting the original table. Opening a memory cursors dose not increment the transaction count of the repository extent the table exists in. However, when you use cursor.save, an atomic transaction occurs. The view directory is cleared when you save a memory cursor. Note that the cursor.save function will fail if there is an open disk cursor on the same table.
Multiple memory cursors may be opened concurrently on the same table. Each cursor is independent of the others and a fresh copy of the table content is loaded into memory each time.
To load a subset of table columns into memory, pass a columns specification during the open on a memory cursor . This technique can be useful in creating indexing tables and reports. Note that a subsequent cursor.save of a memory cursor, opened with a subset of columns specified, will cause the table schema to be permanently changed to reflect only those columns contained in the memory cursor.
Unlike disk and static cursors, you can access the rows of a memory cursor directly using cursor.rowVector[i], where i is the index of the row you want to access. This makes retrieval of memory cursor row objects very fast. In general, this optimization should be avoided unless there is a great need for fast access. Use the cursor.read function instead.
Use dataMineLib.open to create a disk or static cursor. A static cursor provides buffered read only access to a table. A disk cursor provides buffered read, unbuffered write access to a table. Multiple concurrent cursors can be opened on the same table with the restriction that only one of these may be a disk cursor.
Unlike a memory cursor, the disk and static cursors do not load the entire table into memory when they are opened. Instead, records are read into memory only when they are requested. When a disk cursor and one or more static cursors are opened on the same table, they share the bckVector for that table and a buffer vector holding references to any buffered row objects. These two vectors are cursor.recordCount in length. In addition, any buffered records will also be in memory. In addition, each individual disk or static cursor contains a rowVector with a length equal to cursor.recordCount. The rowVector is a vector of integer values. Each of these integer values are an index into the bckVector shared by these cursors.
Use the dataMineLib.open argument bufSize to set a buffer size for a cursor. The buffer size is specified in number of rows and it is up to the caller to ensure that the buffer size specified does not result in excessive memory usage. Pass a bufSize of 0 to dataMineLib.open when you want a cursor to use the bufSize previously set by another, open, cursor. To open a cursor that will not use the buffer, even if the buffer already exists to service other open cursors, pass a bufSize argument of -1. The absolute size of a table opened with a static or disk cursor is limited by available memory. However, the limit on maximum table size is much larger than that of a table opened with a memory cursor. Use a disk or static cursor to open large tables that cannot be opened using a memory cursor.
Disk and Static cursors may be filtered and sorted to create new current views. These views are essentially manipulations of the cursor's rowVector and do not affect the shared bckVector. Use the cursor.saveView function to save the current view into the cursors viewDirectory under a specified name. Saving views in disk and static cursors is a much less memory expensive operation in these cursors, compared to memory cursors, as the rowVector saved in the view is only a vector of integers.
Since disk and static cursors share a common bckVector and buffer some operations on the writable disk cursor will affect any static cursors open on the same table. Essentially any operation that modifies the bckVector will cause an internal resync of the static cursors. This resync clears each static cursors view directory and resets the current view (rowVector) of the static cursor to match the modified bckVector.
The appendToMetaTable function appends a new member table to an
existing meta table.
Type: Function
Syntax: (myCursor.appendToMetaTable tableKey tableName)
Transaction: Atomic
tableKey | A value of any type that will serve to uniquely identify the member table in the meta table. This key must sort higher than all previously supplied member table keys. |
tableName | A symbol or string containing the name of the existin table that will become a member table of the currently opened meta table. |
Returns | true |
The average function computes the average value of a numeric Lambda
over each record in the current view. The numeric Lambda must be a function
expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.average numericLambda)
Transaction: NA
;; Returns the average Salary for the current view.
(myCursor.average (lambda(x) x.Salary) )
The averageForAll function computes the average value of a numeric
Lambda over ALL rows in the table. The numeric Lambda must be a function
expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.averageForAll numericLambda)
Transaction: NA
;; Returns the average Salary for the table
(myCursor.averageForAll (lambda(x) x.Salary))
The colCount property contains the number of columns contained in
cursor.
Type: Property - read only
Syntax: myCursor.colCount
Transaction: NA
The createMemoryCursorFromView function creates and returns a memory
cursor on the current table. The memory cursor contains only those records
contined in the current view.
Type: function
Syntax: myCursor.createMemoryCursorFromView
Transaction: NA
The delete function deletes the specified row from the table's
bckVector. The cursor is reset. Delete is only available on a disk or memory
cursor. Delete is not available on a meta table or on all but the last added
member tables.
Type: Function
Syntax: (myCursor.delete rowIndex)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the
current view and the cursor's view directory is cleared. If delete is called on
a disk cursor then all static cursors opened on same table are also reset.
rowIndex | Row index, in bckVector, of row to be deleted. |
Returns | true |
The delimitedCells function returns an ASCII tab delimited string
from the cursor, containing the values in the specified block of cells. The
result is a tab-delimited string.
Type: Function
Syntax: (myCursor.delimitedCells startRow endRow startCol endCol)
Transaction: NA
startRow | The table cursor row (beginning with 0) where the first cell value is to be obtained. |
endRow | The table cursor row (beginning with 0) where the last cell value is to be obtained. |
startCol | The table column (beginning with 0) where the first cell value is to be obtained. |
endCol | The table column (beginning with 0) where the last cell value is to be obtained. |
Returns | A tab delimited string of cell values from the specified rows and columns (the first row of the tab delimited string contains the column names). |
The deviation function computes the standard deviation value of a
numeric Lambda over each record in the current view. The numeric Lambda must be a
function expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.deviation numericLambda)
Transaction: NA
;; Returns the standard deviation of profit for the table
(myCursor.deviation (lambda(x) x.Profit) )
The drop function removes all rows from the cursor's bckVector. The
cursor is reset. The drop function is only available on the disk and memory
cursors. Delete is not available on a meta tables or all but the last member
member table.
Type: Function
Syntax: (myCursor.drop)
Transaction: NA
Side Effect: Side Effect: Cursor is reset. Remember that reset clears any
filtering from the current view and the cursor's view directory is cleared. If
drop is called on a disk cusor then all static cursors opened on same table are
also reset.
The dropView function drops the current record view identified by the
specified name from the cursors view directory.
Type: Function
Syntax: (myCursor.dropView name)
Transaction: NA
name | A symbol or string containing the name that is the key of the view in the view directory. |
Returns | true |
The exportTab function exports the cursor's current view to a
specified ASCII tab delimited file. The column names will be the first row in
the ASCII tab delimited data file.
Type: Function
Syntax: (myCursor.exportTab asciiFile)
Transaction: NA
asciiFile | A string containing the path and file name of the ASCII tab delimited export data file to be created (the first row will contain the column names). |
Returns | true |
The getColumnHeaders function returns a tab-delimited string of
column names from the cursor.
Type: Function
Syntax: (myCursor.getColumnHeaders startCol endCol)
Transaction: NA
startCol | Zero-based index of the first column to be returned. |
endCol | Zero-based index of the last column to be returned. |
Returns | A tab-delimited string of column names. |
The getNewRecord function returns a new blank row record Vector ready
for data.
Type: Function
Syntax: (myCursor.getNewRecord)
Transaction: NA
Returns | A blank new row record Vector ready for data. |
The importTab function imports the specified ASCII tab delimited file
into the cursor's bckVector. The column names must be the first row in the
ASCII data file, they may not be in a separate header file, nor may they
be passed as a Structure. importTab is not available static cursors or on meta
tables or on all but the last member table.
Type: Function
Syntax: (myCursor.importTab asciiFile)
Transaction: NA
Side Effect: Side Effect: Cursor is reset. Remember that reset clears any
filtering from the current view and the cursor's view directory is cleared. If
importTab is called on a disk cusor then all static cursors opened on same
table are also reset.
asciiFile | The path and file name of the ASCII tab delimited import data (the first row must contain the column names). |
Returns | true |
The insert function inserts the record before the specified row in
the cursor's bckVector. The insert function is available only on memory and
disk cursors. insert is not available on a meta table or on all but the last
member table.
Type: Function
Syntax: (myCursor.insert row record)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the
current view and the cursor's view directory is cleared. If insert is called on
a disk cusor then all static cursors opened on same table are also reset.
rowIndex | Row index of row before which to insert. |
record | Vector containing row record to insert. |
Returns | true |
Note: Use the cursor getNewRecord function to get the Vector for the row to be passed to insert.
The isView function returns true if the specified name is a saved
view in the cursor's view directory.
Type: Function
Syntax: (myCursor.isView viewName)
viewName | A symbol or string containing the name of the view previously saved with the saveView function. |
Returns | Returns true if the key is a saved table view of this cursor. |
The isView function is related to the saveView function, which allows the current record view to be saved under a specified name in the cursors view directory.
The maximum function computes the maximum value of a numeric Lambda
over each row in the current view. The numeric Lambda must be a function
expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.maximum numericLambda)
Transaction: NA
;; Returns the maximum profit for the table
(myCursor.maximum (lambda(x) x.Profit) )
The minimum function computes the minimum value of a numeric Lambda
over each row in the current view. The numeric Lambda must be a function
expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.minimum numericLambda)
Transaction: NA
;; Returns the minimum profit for the table
(myCursor.minimum (lambda(x) x.Profit) )
The myCursorNotes property contains the current notes Dictionary of
the cursor. If there are no notes available, the myCursorNotes property will
contain an empty Dictionary. The myCursorNotes property is not persistent and
will never be saved into the object repository.
Type: Property
Syntax: myCursor.myCursorNotes
The myCursorScore property contains the current score of this data
mine table cursor. If there is not score available, the myCursorScore property
will contain #void. The myCursorScore property is not persistent and will never
be saved into the object repository.
Type: Property
Syntax: myCursor.myCursorScore
The myMemoPad property contains the current memo pad Dictionary of
the cursor. If there are no memos available, the myMemoPad variable will
contain an empty Dictionary.
Type: Property
Syntax: myCursor.myMemoPad
The memo pad is used to store any non-relational data along with the table to the object repository. The memo pad is saved when a disk cursor is closed and any update activity has occurred on the disk cusor. To force a save of the memo pad, pass the save: option argument to the cursor close function.
The omit function deletes all rows from the cursor's bckVector for
which the specified truncate Lambda returns true. The omit function is only
available on memory and disk cursors. omit is not available on meta tables or
all but the last member table.
Type: Function
Syntax: (myCursor.omit truncateLambda)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the
current view and the cursor's view directory is cleared. If omit is called on a
disk cusor then all static cursors opened on same table are also reset.
truncateLambda | The lambda predicate (returns a Boolean result) of one argument which deletes a row whenever the predicate returns a value of true. |
Returns | Record count of resulting cursor. |
The read function reads the specified row from the cursor's current
view.
Type: Function
Syntax: (myCursor.read rowIndex)
Transaction: NA
rowIndex | Row index of the row to read. |
Returns | Memory cursor: Reference to row record object in the cursor's rowVector. Disk or static cursor: Copy of row record object. |
Note: It is important to understand the difference between the return value of read for memory and disk or static cursors. Consider the example below:
;;Using read on a memory cursor
(setq record (myCursor.read 5)); read the sixth record from the current view
;;record now points to the same row record object pointed to at myCursor.rowVector[5]
;;if you modify record you will modify the table's current view. If the current view
;;has not been filtered then rowVector is the same as bckVector.
;;Using read on a disk or static cursor
(setq record (myCusor.read 5)); read the sixth record of the current view
;;record now points to a copy of the row record object in the cursors buffer
;;if you modify the object pointed to by record no changes will be made to the ;;cursors buffer.
The recordCount property contains the number of records contained in
the cursors current view.
Type: Function
Syntax: myCursor.recordCount
Transaction: NA
The refExport function returns a specified row from the cursor's
current view. The primary client of the function is the
dataMineLib.exportTab function.
Type: Function
Syntax: (myCursor.refExport row)
row | The row to return. |
Returns | A row from the cursor. |
The refExport function differs from the read function. It logically preprends a new row to the beginning of the table to contain the names of the column in the table. For instance, (setq record (myCursor.refExport 0)) will set record to a row object containing column names. (setq record (myCursor.refExport 1)) will return the first row of data from the cursor.
The refImport function returns a data object to be filled in by the
caller. If the row argument is zero, then an empty Vector is returned for
column names. Otherwise, an empty record Vector is returned.
Type: Function
Syntax: (myCursor.refImport rowIndex)
rowIndex | 0 for empty vector for column names. Non zero returns an empty row record Vector is returned. |
Returns | An empty vector if row is zero, otherwise an empty row record Vector. |
The reset function clears any filtering from the current view and
deletes any saved views from the view directory.
Type: Function
Syntax: (myCursor.reset)
Transaction: NA
Returns | true |
Note: Reset makes the rows referenced in the current view (contained in the cursors rowVector) the same as the rows referenced in the cursor's bckVector.
The restore function clears any filtering from the current view.
Type: Function
Syntax: (myCursor.restore)
Returns | true |
Note: Reset makes the rows referenced in the current view (contained in the cursors rowVector) the same as the rows referenced in the cursor's bckVector
The restoreView function restores the named view, from the cursor's
view directory, to the cursor's current view.
Type: Function
Syntax: (myCursor.restoreView viewName)
viewName | A symbol or string containing the name of the view in the view directory. |
Returns | true |
The run function runs the specified Lambda against the table.
Type: Function
Syntax: (myCursor.run runLambda)
runLambda | The Lambda to be run against this table. |
The save function overwrites the contents of a table with the current
view of a memory cursor. The save function performs no function on the disk or
static cursors. The save function is not available on memory cursors opened on
member tables.
Type: Function
Syntax: (myCursor.save)
Transaction: Atomic
Returns | true |
The saveView function saves the current view under the specified name
in the cursor's view directory.
Type: Function
Syntax: (myCursor.saveView viewName)
viewName | A symbol or string containing the name of the view. |
Returns | true |
The search function returns the row index for the first record, in
the current view, where the specified search predicate is true. If no record
meets these conditions, then false is returned. The starting row, for
the search, may also be specified. If no starting row is specified, it is
assumed to be zero.
Type: Function
Syntax: (myCursor.search searchLambda startRowIndex)
Transaction: NA
searchLambda | The lambda predicate (returns a Boolean result) of one argument which returns true when the correct row is found. |
startRowIndex | An optional argument specifying the starting row index for the search (if omitted, zero is assumed). |
Returns | The row index for the first record where the search predicate is true or #void. |
Syntax: (myCursor.search searchColumn searchValue startRowIndex)
Transaction: NA
searchColumn | The name of the column to search. |
searchValue | The value to search for in the specified column. |
startRowIndex | (Optional)The starting row index for the search (if omitted, zero is assumed). |
Returns | The row index for the first record where the search predicate is true or #void. |
The setImport function receives a single record from the
importTab function into the bckVector of the cursor. The setImport
function is only available on memory and disk cursors. setImport is not
available on a meta table or all but the last member table.
Type: Function
Syntax: (myCursor.setImport rowIndex recordVector)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the
current view and the cursor's view directory is cleared. If setImport is called
on a disk cusor then all static cursors opened on same table are also reset.
rowIndex | Row index of the row to write to. |
recordVector | Vector of values to write to the specified row. |
Returns | true |
The setImport function differs from the write function. It examines the rowIndex value and if it is zero it expects to collect and process column names from this row. If the table has no columns defined the it will use the columns specified in the first process row as the table columns. If columns are already defined, then setImport will compare them with those passed in the first processed row to ensure they match.
The sharpe function computes the sharpe ratio value of a numeric
Lambda over the rows in the current view. The numeric Lambda must be a function
expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.sharpe numericLambda)
Transaction: NA
(myCursor.sharpe (lambda(x) x.Profit) ) ;; Returns the sharpe ratio of profit for the table
The show function displays a limited number of rows, beginning with a
specified row number from the cursor's current view, on the system console.
Type: Function
Syntax: (myCursor.show startRow)
Transaction: NA
startRow | Row index of row to start on. |
Returns | true |
Use thesort function to sort the current view. The backup vector will not be affected.
The optional cutCount argument allows the specified amount of records to be dropped after the sort.
Type: Function
Syntax: (myCursor.sort sortLambda cutAmount)
sortLambda | The sort lambda predicate designed to compare two records. |
cutAmount | (optional). |
Returns | true |
The tile function reduces the set of rows, in the current view, to the nth tile of N tiles.
Type: Function
Syntax: (myCursor.tile tileCount tileIndex)
Transaction: NA
tileCount | The count of tiles into which the current cursor rows are to be divided. For instance, if we wanted to divide the current cursor rows into percentiles, then tileCount == 100. |
tileIndex | The index of the tile from the current cursor rows to be returned. For instance, if we wanted to see the 6th percentile, then tileIndex == 6. |
Returns | Record count of resulting cursor. |
Syntax: (myCursor.truncate rowCount)
Transaction: NA
rowCount | The cursor is truncated so that only the specified number of rows remain. |
Returns | Record count of resulting cursor. |
The total function computes the total value of a numeric Lambda over
each row of the current view. The numeric Lambda must be a function expecting
one argument (a single table record).
Type: Function
Syntax: (myCursor.total numericLambda)
Transaction: NA
(myCursor.total (lambda(x) x.Sales) ) ;; Returns the total Sales for the table
The totalForAll function computes the total value of a numeric Lambda
over all rows in the bckVector. The numeric Lambda must be a function expecting
one argument (a single table record).
Type: Function
Syntax: (myCursor.totalForAll numericLambda)
Transaction: NA
(myCursor.totalForAll (lambda(x) x.Sales) ) ;; Returns the total Sales for the table
The truncate function truncates a set of rows from the current view.
Type: Function
Syntax: (myCursor.truncate truncateLambda)
Transaction: NA
truncateLambda | The lambda predicate (returns a Boolean result) of one argument which truncates a row whenever the predicate returns a value of false. |
Returns | Record count of resulting cursor. |
Syntax: (myCursor.truncate rowCount)
Transaction: NA
rowCount | The cursor is truncated so that only the specified number of rows remain. |
Returns | Record count of resulting cursor. |
The updateView function updates the set of records in the memory
cursor's current view. The updateView function is not available on the disk or
static cursors. You may also pass the optional reset: argument to clear
any filtering in the current view after the update.
Type: Function
Syntax: (myCursor.updateView updateLambda reset:)
Transaction: NA
updateLambda | The lambda function (updates each record) of one argument which updates a record in the current view. |
reset: | An optional symbol argument to cause any current view filtering to be cleared. |
Returns | true. |
The updateView function allows you to filter the current view and then call updateView to modify only those rows contained in the current view. Note: The rows in the current view (rowVector) in a memory cursor point to the same row record objects pointed to by the bckVector. This means that the updates performed by updateView affect the row record objects shared by the cursors bckVector.
The reset: option is a convenience to allow the full row set to be restored at the end of the update operation.
The viewMath function performs the logical operations of,
and:, or:, and xor:, on the specified views in the cursors
view directory. Use the saveView function save views in the cursor's
viewDirectory. The viewMath function creates a new current view by
logically combining two previously saved views.
Type: Function
Syntax: (myCursor.viewMath operator viewName1 viewName2)
Transaction: NA
operator | A symbol name of the logical operator; and:, or:, xor:. |
viewName1 | A symbol or string containing the name of the first view. |
viewName2 | A symbol or string containing the name of the second view. |
Returns | Record count of resulting combined view. |
The write function writes the specified row to the cursor's
bckVector. The write function is available only on the memory and disk cursor.
Write can not be used to extend the meta table or all but the last member
table.
Type: Function
Syntax: (myCursor.write row record)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the
current view and the cursor's view directory is cleared. If write is called on
a disk cusor then all static cursors opened on same table are also reset.
rowIndex | Row index in the bckVector of row to update. |
record | Vector of values to update the row with. |
Returns | true |