Table Cursors

 

Table Cursors Overview

The dataMineLib provides the application with an API to create, manipulate and destroy tables. Use the dataMineLib.createTable function to create a table. Once a table has been created you will most often use a table cursor to access its contents. Table cursors come in three flavors; memory, disk and static. Create a cursor using the dataMineLib.open function.

Memory cursors open a table and read its entire contents into memory. Static cursors provide buffered read only access to a table. Disk cursors provide buffered read and unbuffered write access to a table. You can open multiple concurrent cursors on the same table with the restriction that only one disk cursor may be open on that table. Note that disk and static cursors opened on the same table share the same buffer.

Views

Cursors provide you the ability to view subsets of the information in the table by running filtering operations against the cursor. Filtering operations are applied to the current view and result in a new current view. Filtering does not affect the underlying table data and the current view can be restored to match the underlying table data with the cursor.restore or cursor.reset functions. Use the cursor.saveView function to save the current view, by name, into the table's view directory. Use the cursor.restoreView function to make a saved view the current view. Note that the view directory is saved in the object repository as part of the collection of table objects reachable through the table schema object. Note that the current view is contained in the cursors rowVector property and you will see reference to rowVector in this document. The bckVector contains the unfiltered content of the cursor's underlying table. The cursor.reset function mentioned above essentially create a rowVector that references all of the rows in bckVector in bckVector order to restore an unfiltered view of the table.

Side Effects

This implementation of tables has been optimized for datamining operations where bulk load and read/filtering access performance is given priority over other possible table operations. For this reason, individual row writes and row deletes are expensive and have significant side affects. For instance, a row delete or row insert will have the following side effects:

  1. Current view will reflect table content after operation. (Any filtering will have been discarded)
  2. All views in view directory will be discarded.
  3. If the operation was applied on a disk cursor then all static cursors, opened on the same table, will be similarly affected.

API Consistency

All cursors share the same basic API. However, not all functions are available on each cursor type and not all operations return exactly the same type of result. In addition, some cursor function options are not available for all cursor types. These differences are documented in the function documentation.

Memory Cursors

Use the dataMineLib.open function to create a memory cursor object. The memory cursor object loads the entire contents of a specified table into memory. This allows very fast manipulation of the table at the expense of memory usage. The following example opens a memory cursor on a table named myTable. myTable must already exist for a cursor to be opened on it.

(setq myCursor (dataMineLib.open myTable: memory:))

When you filter a memory cursor, a copy of the table content is made in memory and then the undesired rows are removed from the copy to generate the current view. Use cursor.viewSave to save the current view in the cursors viewDirectory. Note that each view saved in the view directory contains the complete content of the rows contained in the view. Take care to account for the memory usage incurred by view creation in a memory cursor.

Use the cursor.save function to save the current contents of a memory cursor to disk, overwriting the original table. Opening a memory cursors dose not increment the transaction count of the repository extent the table exists in. However, when you use cursor.save, an atomic transaction occurs. The view directory is cleared when you save a memory cursor. Note that the cursor.save function will fail if there is an open disk cursor on the same table.

Multiple memory cursors may be opened concurrently on the same table. Each cursor is independent of the others and a fresh copy of the table content is loaded into memory each time.

To load a subset of table columns into memory, pass a columns specification during the open on a memory cursor . This technique can be useful in creating indexing tables and reports. Note that a subsequent cursor.save of a memory cursor, opened with a subset of columns specified, will cause the table schema to be permanently changed to reflect only those columns contained in the memory cursor.

Unlike disk and static cursors, you can access the rows of a memory cursor directly using cursor.rowVector[i], where i is the index of the row you want to access. This makes retrieval of memory cursor row objects very fast. In general, this optimization should be avoided unless there is a great need for fast access. Use the cursor.read function instead.

Disk & Static Cursors

Use dataMineLib.open to create a disk or static cursor. A static cursor provides buffered read only access to a table. A disk cursor provides buffered read, unbuffered write access to a table. Multiple concurrent cursors can be opened on the same table with the restriction that only one of these may be a disk cursor.

Unlike a memory cursor, the disk and static cursors do not load the entire table into memory when they are opened. Instead, records are read into memory only when they are requested. When a disk cursor and one or more static cursors are opened on the same table, they share the bckVector for that table and a buffer vector holding references to any buffered row objects. These two vectors are cursor.recordCount in length. In addition, any buffered records will also be in memory. In addition, each individual disk or static cursor contains a rowVector with a length equal to cursor.recordCount. The rowVector is a vector of integer values. Each of these integer values are an index into the bckVector shared by these cursors.

Use the dataMineLib.open argument bufSize to set a buffer size for a cursor. The buffer size is specified in number of rows and it is up to the caller to ensure that the buffer size specified does not result in excessive memory usage. Pass a bufSize of 0 to dataMineLib.open when you want a cursor to use the bufSize previously set by another, open, cursor. To open a cursor that will not use the buffer, even if the buffer already exists to service other open cursors, pass a bufSize argument of -1. The absolute size of a table opened with a static or disk cursor is limited by available memory. However, the limit on maximum table size is much larger than that of a table opened with a memory cursor. Use a disk or static cursor to open large tables that cannot be opened using a memory cursor.

Disk and Static cursors may be filtered and sorted to create new current views. These views are essentially manipulations of the cursor's rowVector and do not affect the shared bckVector. Use the cursor.saveView function to save the current view into the cursors viewDirectory under a specified name. Saving views in disk and static cursors is a much less memory expensive operation in these cursors, compared to memory cursors, as the rowVector saved in the view is only a vector of integers.

Since disk and static cursors share a common bckVector and buffer some operations on the writable disk cursor will affect any static cursors open on the same table. Essentially any operation that modifies the bckVector will cause an internal resync of the static cursors. This resync clears each static cursors view directory and resets the current view (rowVector) of the static cursor to match the modified bckVector.

appendToMetaTable

The appendToMetaTable function appends a new member table to an existing meta table.
Type: Function
Syntax: (myCursor.appendToMetaTable tableKey tableName)
Transaction: Atomic

tableKey A value of any type that will serve to uniquely identify the member table in the meta table. This key must sort higher than all previously supplied member table keys.
tableName A symbol or string containing the name of the existin table that will become a member table of the currently opened meta table.
Returns  true 

average

The average function computes the average value of a numeric Lambda over each record in the current view. The numeric Lambda must be a function expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.average numericLambda)
Transaction: NA

Example1

;; Returns the average Salary for the current view.
(myCursor.average (lambda(x) x.Salary) )

averageForAll

The averageForAll function computes the average value of a numeric Lambda over ALL rows in the table. The numeric Lambda must be a function expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.averageForAll numericLambda)
Transaction: NA

Example1

;; Returns the average Salary for the table
(myCursor.averageForAll (lambda(x) x.Salary))

colCount

The colCount property contains the number of columns contained in cursor.
Type: Property - read only
Syntax: myCursor.colCount
Transaction: NA

createMemoryCursorFromView

The createMemoryCursorFromView function creates and returns a memory cursor on the current table. The memory cursor contains only those records contined in the current view.
Type: function
Syntax: myCursor.createMemoryCursorFromView
Transaction: NA

delete

The delete function deletes the specified row from the table's bckVector. The cursor is reset. Delete is only available on a disk or memory cursor. Delete is not available on a meta table or on all but the last added member tables.
Type: Function
Syntax: (myCursor.delete rowIndex)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the current view and the cursor's view directory is cleared. If delete is called on a disk cursor then all static cursors opened on same table are also reset.

rowIndex Row index, in bckVector, of row to be deleted.
Returns true

delimitedCells

The delimitedCells function returns an ASCII tab delimited string from the cursor, containing the values in the specified block of cells. The result is a tab-delimited string.
Type: Function
Syntax: (myCursor.delimitedCells startRow endRow startCol endCol)
Transaction: NA

startRow The table cursor row (beginning with 0) where the first cell value is to be obtained.
endRow The table cursor row (beginning with 0) where the last cell value is to be obtained.
startCol The table column (beginning with 0) where the first cell value is to be obtained.
endCol The table column (beginning with 0) where the last cell value is to be obtained.
Returns A tab delimited string of cell values from the specified rows and columns (the first row of the tab delimited string contains the column names).

deviation

The deviation function computes the standard deviation value of a numeric Lambda over each record in the current view. The numeric Lambda must be a function expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.deviation numericLambda)
Transaction: NA

Example1

;; Returns the standard deviation of profit for the table
(myCursor.deviation (lambda(x) x.Profit) )

drop

The drop function removes all rows from the cursor's bckVector. The cursor is reset. The drop function is only available on the disk and memory cursors. Delete is not available on a meta tables or all but the last member member table.
Type: Function
Syntax: (myCursor.drop)
Transaction: NA
Side Effect: Side Effect: Cursor is reset. Remember that reset clears any filtering from the current view and the cursor's view directory is cleared. If drop is called on a disk cusor then all static cursors opened on same table are also reset.

dropView

The dropView function drops the current record view identified by the specified name from the cursors view directory.
Type: Function
Syntax: (myCursor.dropView name)
Transaction: NA

name A symbol or string containing the name that is the key of the view in the view directory.
Returns true

exportTab

The exportTab function exports the cursor's current view to a specified ASCII tab delimited file. The column names will be the first row in the ASCII tab delimited data file.
Type: Function
Syntax: (myCursor.exportTab asciiFile)
Transaction: NA

asciiFile A string containing the path and file name of the ASCII tab delimited export data file to be created (the first row will contain the column names).
Returns true

getColumnHeaders

The getColumnHeaders function returns a tab-delimited string of column names from the cursor.
Type: Function
Syntax: (myCursor.getColumnHeaders startCol endCol)
Transaction: NA

startCol Zero-based index of the first column to be returned.
endCol Zero-based index of the last column to be returned.
Returns A tab-delimited string of column names.

getNewRecord

The getNewRecord function returns a new blank row record Vector ready for data.
Type: Function
Syntax: (myCursor.getNewRecord)
Transaction: NA

Returns A blank new row record Vector ready for data.

importTab

The importTab function imports the specified ASCII tab delimited file into the cursor's bckVector. The column names must be the first row in the ASCII data file, they may not be in a separate header file, nor may they be passed as a Structure. importTab is not available static cursors or on meta tables or on all but the last member table.
Type: Function
Syntax: (myCursor.importTab asciiFile)
Transaction: NA
Side Effect: Side Effect: Cursor is reset. Remember that reset clears any filtering from the current view and the cursor's view directory is cleared. If importTab is called on a disk cusor then all static cursors opened on same table are also reset.

asciiFile The path and file name of the ASCII tab delimited import data (the first row must contain the column names).
Returns true

insert

The insert function inserts the record before the specified row in the cursor's bckVector. The insert function is available only on memory and disk cursors. insert is not available on a meta table or on all but the last member table.
Type: Function
Syntax: (myCursor.insert row record)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the current view and the cursor's view directory is cleared. If insert is called on a disk cusor then all static cursors opened on same table are also reset.

rowIndex Row index of row before which to insert.
record Vector containing row record to insert.
Returns true

Note: Use the cursor getNewRecord function to get the Vector for the row to be passed to insert.

isView

The isView function returns true if the specified name is a saved view in the cursor's view directory.
Type: Function
Syntax: (myCursor.isView viewName)

viewName A symbol or string containing the name of the view previously saved with the saveView function.
Returns Returns true if the key is a saved table view of this cursor.

The isView function is related to the saveView function, which allows the current record view to be saved under a specified name in the cursors view directory.

maximum

The maximum function computes the maximum value of a numeric Lambda over each row in the current view. The numeric Lambda must be a function expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.maximum numericLambda)
Transaction: NA

Example1

;; Returns the maximum profit for the table
(myCursor.maximum (lambda(x) x.Profit) )

minimum

The minimum function computes the minimum value of a numeric Lambda over each row in the current view. The numeric Lambda must be a function expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.minimum numericLambda)
Transaction: NA

Example1

;; Returns the minimum profit for the table
(myCursor.minimum (lambda(x) x.Profit) )

myCursorNotes

The myCursorNotes property contains the current notes Dictionary of the cursor. If there are no notes available, the myCursorNotes property will contain an empty Dictionary. The myCursorNotes property is not persistent and will never be saved into the object repository.
Type: Property
Syntax: myCursor.myCursorNotes

myCursorScore

The myCursorScore property contains the current score of this data mine table cursor. If there is not score available, the myCursorScore property will contain #void. The myCursorScore property is not persistent and will never be saved into the object repository.
Type: Property
Syntax: myCursor.myCursorScore

myMemoPad

The myMemoPad property contains the current memo pad Dictionary of the cursor. If there are no memos available, the myMemoPad variable will contain an empty Dictionary.
Type: Property
Syntax: myCursor.myMemoPad

The memo pad is used to store any non-relational data along with the table to the object repository. The memo pad is saved when a disk cursor is closed and any update activity has occurred on the disk cusor. To force a save of the memo pad, pass the save: option argument to the cursor close function.

omit

The omit function deletes all rows from the cursor's bckVector for which the specified truncate Lambda returns true. The omit function is only available on memory and disk cursors. omit is not available on meta tables or all but the last member table.
Type: Function
Syntax: (myCursor.omit truncateLambda)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the current view and the cursor's view directory is cleared. If omit is called on a disk cusor then all static cursors opened on same table are also reset.

truncateLambda The lambda predicate (returns a Boolean result) of one argument which deletes a row whenever the predicate returns a value of true.
Returns Record count of resulting cursor.

read

The read function reads the specified row from the cursor's current view.
Type: Function
Syntax: (myCursor.read rowIndex)
Transaction: NA

rowIndex Row index of the row to read.
Returns Memory cursor: Reference to row record object in the cursor's rowVector. Disk or static cursor: Copy of row record object.

Example 1

Note: It is important to understand the difference between the return value of read for memory and disk or static cursors. Consider the example below:

;;Using read on a memory cursor
(setq record (myCursor.read 5)); read the sixth record from the current view
;;record now points to the same row record object pointed to at myCursor.rowVector[5]
;;if you modify record you will modify the table's current view. If the current view
;;has not been filtered then rowVector is the same as bckVector.
;;Using read on a disk or static cursor
(setq record (myCusor.read 5)); read the sixth record of the current view
;;record now points to a copy of the row record object in the cursors buffer
;;if you modify the object pointed to by record no changes will be made to the ;;cursors buffer.

recordCount

The recordCount property contains the number of records contained in the cursors current view.
Type: Function
Syntax: myCursor.recordCount
Transaction: NA

refExport

The refExport function returns a specified row from the cursor's current view. The primary client of the function is the dataMineLib.exportTab function.
Type: Function
Syntax: (myCursor.refExport row)

row The row to return.
Returns A row from the cursor.

The refExport function differs from the read function. It logically preprends a new row to the beginning of the table to contain the names of the column in the table. For instance, (setq record (myCursor.refExport 0)) will set record to a row object containing column names. (setq record (myCursor.refExport 1)) will return the first row of data from the cursor.

refImport

The refImport function returns a data object to be filled in by the caller. If the row argument is zero, then an empty Vector is returned for column names. Otherwise, an empty record Vector is returned.
Type: Function
Syntax: (myCursor.refImport rowIndex)

rowIndex 0 for empty vector for column names. Non zero returns an empty row record Vector is returned.
Returns An empty vector if row is zero, otherwise an empty row record Vector.

reset

The reset function clears any filtering from the current view and deletes any saved views from the view directory.
Type: Function
Syntax: (myCursor.reset)
Transaction: NA

Returns true

Note: Reset makes the rows referenced in the current view (contained in the cursors rowVector) the same as the rows referenced in the cursor's bckVector.

restore

The restore function clears any filtering from the current view.
Type: Function
Syntax: (myCursor.restore)

Returns true

Note: Reset makes the rows referenced in the current view (contained in the cursors rowVector) the same as the rows referenced in the cursor's bckVector

restoreView

The restoreView function restores the named view, from the cursor's view directory, to the cursor's current view.
Type: Function
Syntax: (myCursor.restoreView viewName)

viewName A symbol or string containing the name of the view in the view directory.
Returns true

run

The run function runs the specified Lambda against the table.
Type: Function
Syntax: (myCursor.run runLambda)

runLambda The Lambda to be run against this table.

save

The save function overwrites the contents of a table with the current view of a memory cursor. The save function performs no function on the disk or static cursors. The save function is not available on memory cursors opened on member tables.
Type: Function
Syntax: (myCursor.save)
Transaction: Atomic

Returns true

saveView

The saveView function saves the current view under the specified name in the cursor's view directory.
Type: Function
Syntax: (myCursor.saveView viewName)

viewName A symbol or string containing the name of the view.
Returns true

search

setImport

The setImport function receives a single record from the importTab function into the bckVector of the cursor. The setImport function is only available on memory and disk cursors. setImport is not available on a meta table or all but the last member table.
Type: Function
Syntax: (myCursor.setImport rowIndex recordVector)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the current view and the cursor's view directory is cleared. If setImport is called on a disk cusor then all static cursors opened on same table are also reset.

rowIndex Row index of the row to write to.
recordVector Vector of values to write to the specified row.
Returns true

The setImport function differs from the write function. It examines the rowIndex value and if it is zero it expects to collect and process column names from this row. If the table has no columns defined the it will use the columns specified in the first process row as the table columns. If columns are already defined, then setImport will compare them with those passed in the first processed row to ensure they match.

sharpe

The sharpe function computes the sharpe ratio value of a numeric Lambda over the rows in the current view. The numeric Lambda must be a function expecting one argument (a single table row).
Type: Function
Syntax: (myCursor.sharpe numericLambda)
Transaction: NA

Example1

(myCursor.sharpe (lambda(x) x.Profit) ) ;; Returns the sharpe ratio of profit for the table

show

The show function displays a limited number of rows, beginning with a specified row number from the cursor's current view, on the system console.
Type: Function
Syntax: (myCursor.show startRow)
Transaction: NA

startRow Row index of row to start on.
Returns true

sort

Use thesort function to sort the current view. The backup vector will not be affected. The optional cutCount argument allows the specified amount of records to be dropped after the sort.
Type: Function
Syntax: (myCursor.sort sortLambda cutAmount)

sortLambda The sort lambda predicate designed to compare two records.
cutAmount (optional).
Returns true

tile

The tile function reduces the set of rows, in the current view, to the nth tile of N tiles.
Type: Function
Syntax: (myCursor.tile tileCount tileIndex)
Transaction: NA

tileCount The count of tiles into which the current cursor rows are to be divided. For instance, if we wanted to divide the current cursor rows into percentiles, then tileCount == 100.
tileIndex The index of the tile from the current cursor rows to be returned. For instance, if we wanted to see the 6th percentile, then tileIndex == 6.
Returns Record count of resulting cursor.

Syntax: (myCursor.truncate rowCount)
Transaction: NA

rowCount The cursor is truncated so that only the specified number of rows remain.
Returns Record count of resulting cursor.

total

The total function computes the total value of a numeric Lambda over each row of the current view. The numeric Lambda must be a function expecting one argument (a single table record).
Type: Function
Syntax: (myCursor.total numericLambda)
Transaction: NA

Example1

(myCursor.total (lambda(x) x.Sales) ) ;; Returns the total Sales for the table

totalForAll

The totalForAll function computes the total value of a numeric Lambda over all rows in the bckVector. The numeric Lambda must be a function expecting one argument (a single table record).
Type: Function
Syntax: (myCursor.totalForAll numericLambda)
Transaction: NA

Example1

(myCursor.totalForAll (lambda(x) x.Sales) ) ;; Returns the total Sales for the table

truncate

The truncate function truncates a set of rows from the current view.
Type: Function
Syntax: (myCursor.truncate truncateLambda)
Transaction: NA

truncateLambda The lambda predicate (returns a Boolean result) of one argument which truncates a row whenever the predicate returns a value of false.
Returns Record count of resulting cursor.

Syntax: (myCursor.truncate rowCount)
Transaction: NA

rowCount The cursor is truncated so that only the specified number of rows remain.
Returns Record count of resulting cursor.

updateView

The updateView function updates the set of records in the memory cursor's current view. The updateView function is not available on the disk or static cursors. You may also pass the optional reset: argument to clear any filtering in the current view after the update.
Type: Function
Syntax: (myCursor.updateView updateLambda reset:)
Transaction: NA

updateLambda The lambda function (updates each record) of one argument which updates a record in the current view.
reset: An optional symbol argument to cause any current view filtering to be cleared.
Returns true.

The updateView function allows you to filter the current view and then call updateView to modify only those rows contained in the current view. Note: The rows in the current view (rowVector) in a memory cursor point to the same row record objects pointed to by the bckVector. This means that the updates performed by updateView affect the row record objects shared by the cursors bckVector.

The reset: option is a convenience to allow the full row set to be restored at the end of the update operation.

viewMath

The viewMath function performs the logical operations of, and:, or:, and xor:, on the specified views in the cursors view directory. Use the saveView function save views in the cursor's viewDirectory. The viewMath function creates a new current view by logically combining two previously saved views.
Type: Function
Syntax: (myCursor.viewMath operator viewName1 viewName2)
Transaction: NA

operator A symbol name of the logical operator; and:, or:, xor:.
viewName1 A symbol or string containing the name of the first view.
viewName2 A symbol or string containing the name of the second view.
Returns Record count of resulting combined view.

write

The write function writes the specified row to the cursor's bckVector. The write function is available only on the memory and disk cursor. Write can not be used to extend the meta table or all but the last member table.
Type: Function
Syntax: (myCursor.write row record)
Transaction: NA
Side Effect: Cursor is reset. Remember that reset clears any filtering from the current view and the cursor's view directory is cleared. If write is called on a disk cusor then all static cursors opened on same table are also reset.

rowIndex Row index in the bckVector of row to update.
record Vector of values to update the row with.
Returns true