The job provides the ability to process different CSV files with relations between them and different steps (snapshots of data). The dataProcessor.json file uses contexts defines in all three previously mentioned files.
Processing CSV files#
CSV files are defined by Files arrays, each file contains a Name (for reference to columns by other files), an InputFilePath, an ErrorFilePath and an OutputFilePath. You may overwrite the Encoding of Input, Error and/or Output file. The file contains an array of Columns used to describe the columns of the file to be transformed.
Each Column contains an array of Names where you can specify a single column name or range of column names (for doing group transformations between different columns). The names should be identical to the columns' names in CSV file, they are case sensitive.
The Column also contains an array of Processing steps with two variables
- Step: the step number of the transformation. Transformations are done step-by-step, and each step creates a snapshot of the data received as a result of the step processing (transformation). The value of any previous step may be used in following steps. For example, the protected value from Step 1 may be used in Step 2 for another field.
- Processors: the array of predefined processors used for transformation or preprocessing values.
Processors#
Each processor has a Name and Type as required fields.
- For "Type": "RPSEngineTransformation", the possible values of "Name" are "Default", "RandomShuffling", "DateRange", "Conditional", "StartsWithConditional" and "XmlTransform".
- For "Type": "Preprocessing", the possible values of "Name" are "ReadAllUniqueValues" and "ParseDateTime".
- For "Type": "Postprocessing", the possible values of "Name" are "NullValue", "DefaultBooleanValue" and "ToLowerCaseValue".
Processors may take an Options array as argument, some processors require it such as the "Conditional"
processor requiring Condition If and Else to be specified or
ClassName and PropertyName for processor "RPSEngineTransformation". Each processor
may also have a Dependencies array. Each Dependency element contains a Name
and Value. Name is used to link the dependency to RPS Engine
.
Options#
The dataProcessor.json allows for multiple options definitions. Options can be defined at job level and will apply to all files transformed, or at file level and will only apply to the concerned file, note that file level options supersede job level options if defined. The tables below provide a summary of the different options that can be used, for examples of usage see the related Examples section.
Job level options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
CsvOptions | Yes | string | - | Gives the possibility to define CSV file related options such as Delimiter or NewLine. | |
Encoding | No | string | - | Allows to change the encoding of the CSV file. | |
Files | Yes | - | - | Used to provide all configuration for files transformation (Name, Columns and paths). | |
HeaderRowsCount | No | int | 1 | Sets the number of header rows. | |
HeaderRowIndex | No | int | 1 | Sets the row to use for column names. | |
SkipEmptyRecords | No | bool | false | If set to true allows the Files Processor to skip empty rows without stopping the whole process. | |
RightsContext | No | string | - | Sets the right context as defined in rightsContexts.json. | |
ProcessingContext | No | string | - | Sets the processing context as defined in processingContexts.json. | |
BatchSizeByRows | No | int | 0 | Gives the number of rows to be sent by batches for the processing. | |
ProcessingTemplates | Yes | - | - | If the processing is similar for multiple columns among documents, allows a quicker definition of the processing steps. | |
ExpressionSettings | Yes | - | - | Allows the definition of additional null values apart from the usual empty string "". |
File level options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Name/Names | No | string/array | - | Defines the name (or names) of the file concerned by the following configuration. | |
HeaderRowsCount | No | int | 1 | Defines the number of Header rows on the CSV file. | |
HeaderRowIndex | No | int | 1 | Defines which row should be used as the index for Columns names. | |
SkipEmptyRecords | No | bool | false | Used to ignore empty columns or fields in a CSV without throwing an error and stopping the protection of the rest of the file. | |
InputFile | Yes | - | - | Provides the path of the input file. Can be used to define specific configuration for the input file (see associated options). | |
OutputFile | Yes | - | - | Provides the path of the output file. Can be used to define specific configuration for the output file (see associated options). | |
ErrorFile | Yes | - | - | Provides the path of the error file. Can be used to define specific configuration for the error file (see associated options). | |
FakeInputFile | Yes | - | - | Allows the configuration of a fake input file to be used by the Files Processor. See associated options. | |
BatchSizeByRows | No | int | 0 | Gives the number of rows to be sent by batches for the processing. | |
Relations | Yes | - | - | Declared for the file whose column will be referred to by another file's transformation. | |
ForeignKey | No | string | - | Used to refer to another file's column, is used in pair with Relations option. | |
Encoding | No | string | - | Sets the encoding for the files. Is superseded by specific Input/Output/Error files configurations. | |
CsvOptions | Yes | - | - | Gives the possibility to define CSV file related options such as Delimiter or NewLine. | |
RightsContext | No | string | - | Used to define the Rights Contexts for the processing of the file. | |
ProcessingContext | No | string | - | Used to define the Processing Contexts for the processing of the file. | |
ColumnProperties | Yes | - | - | Defines an array of ColumnProperties classes. | |
Columns | Yes | - | - | Configures the processing of each columns of the file, see associated options. |
Input/Output/Error files options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Path | No | string | - | Used to provide the path of the concerned file. | |
CsvOptions | Yes | - | - | Gives the possibility to define CSV file related options such as Delimiter or NewLine. | |
Encoding | No | string | - | Sets the encoding for the files. Is superseded by specific Input/Output/Error files configurations. |
Fake files options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
RequireTransformation | No | bool | false | If true = fake rows will be joined to the original rows and transformed by RPS Engine together with the original rows. If false = fake rows will be injected after transformations and not sent to RPS Engine. | |
InjectionType | Yes | - | - | Defines the type of row injections. | |
NumberOfFakeRows | No | int | 0 | Determines the number of fake rows to generate in this file. |
CsvOptions options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
NewLine | No | string | - | Defines the new line symbol, for example: "\n" or "\r\n". | |
Delimiter | No | string | - | Defines the delimiter between columns, by default it is: ",". |
ExpressionSettings options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
AdditionalNullValues | No | string | - | By default the Null value of a cell is an empty string "", this allows to define additional Null values. |
Relations options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
File | No | string | - | Provides the name of the file to use. | |
Keys | No | string | - | Defines the keys for the SourceColumn (source column) and the DestColumn (destination column). | |
Mappings | No | string dictionary | - | Creates the mapping between two file using a dictionary. For instance describing the relation between column PersonId in file Visits and column Id in file Person, we need to specify the mapping as: [{"PersonId","Id"}]. | |
OrderBy | Yes | - | - | In cases where mapping relates multiple values one to another, OrderBy defines the order in which to treat them. For instance with the Mappings example, multiple PersonId corresponding to multiple visits can be linked to a single Id. |
RowInjection options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
RowInjectionType | No | enum | 0 | Defines the type of row injections, can be Unknown (or 0), AppendToEnd or AppendRandom. |
SortDirection options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
SortDirection | No | enum | 0 | Sets the sorting direction. Can take values Unset (or 0), ASC (ascending) or DESC (descending). |
Columns options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Names | No | string/array | - | Provides the names of the tables for the following configuration. | |
ClassName | No | string | - | Makes the relation with the Class Name from RPS Core Configuration. | |
PropertyName | No | string | - | Makes the relation with the Property Name from RPS Core Configuration. | |
RightsContext | No | string | - | Makes the relation with the Rights Context from RPS Core Configuration. | |
ProcessingContext | No | string | - | Makes the relation with the Processing Context from RPS Core Configuration. | |
Processing | Yes | - | - | Used to define the processing of this column, if not defined through a Processing Template. | |
ProcessingTemplateName | No | string | - | Used to define the name of the Processing Template defined at Job Level. |
ColumnProperties options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Name | No | string | - | Name of the column for which additional options are defined. | |
Storage | No | enum | Memory | Defines where the column data will be stored. Values can be Memory or MongoDb. This option is used if memory is not sufficient to store all columns data. |
ProcessingStep options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Step | No | int | 0 | Provides the number of the step in a Processing definition. | |
Processors | Yes | - | - | Defines the name of the processor for a given step. |
ProcessingStepProcessor options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Name | No | string | - | Used to give the name of the processor to use, see the concerned section above. | |
Type | No | string | - | Defines the type of the processor to use, see the concerned section above. | |
DataType | No | enum | string | Enumeration of the type of data to be used by the processor. Can be String, DateTime, Long, Any... | |
Options | No | - | - | Used to define specific options to each processor. | |
Value | No | string | - | Defines the value to be transformed by the processor, by default it is the value of the CSV column but can be changed here by any other value. | |
Dependencies | Yes | dictionary | - | Defines a key-value pair of dependencies for the transformers. For instance a date protection might need a definition of limits (max and min), "Dependencies": {"min": "1950-01-01", "max": "2000-12-31"}. |
ProcessingTemplate options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Name | No | string | - | Defines a name for the processing template. | |
Processing | Yes | - | - | Used to define the processing of this column, if not defined through a Processing Template. |
Conditional Processor options#
Conditions of the Conditional Processor options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
Condition | No | string | - | Condition used for "if" and "else", operators && (and) and || (or) can be used in the string to define more complex conditions. | |
Action | Yes | - | - | Determines the action to do if the condition is met, if left empty or absent the default Right and Processing contexts defined at job level are applied, if any. |
Actions for the Conditional Processor options#
Option | Required | Additional options | Type | Default value | Description |
---|---|---|---|---|---|
DeleteRow | No | bool | - | Used to delete a row in the case the right condition is met. | |
RightsContext | No | string | - | Sets the right context in case the condition is met. | |
ProcessingContext | No | string | - | Sets the processing context in case the condition is met. | |
Value | No | string | - | Sets the value of the column to the value of another column or of a previous step: see examples. |
Example of dataProcessor.json#
{
"Jobs": [
{
"Name": "job_name_example1",
"TypeName": "CsvFilesProcessorJob",
"CronExpression": "0 0/1 * 1/1 * ? *",
"Options": {
"CsvOptions": {
"Delimiter": ";",
"DetectColumnCountChanges": true
},
"Encoding": "utf-8",
"RightsContext": "Transform",
"ProcessingContext": "Protect",
"BatchSizeByRows": 400,
"HeaderRowsCount": 1,
"HeaderRowIndex": 1,
"ProcessingTemplates": [
{
"Name": "ProtectionTemplate",
"Processing": [
{
"Step": 1,
"Processors": [
{
"Name": "Default",
"Type": "RPSEngineTransformation"
}
]
}
]
}
],
"SkipEmptyRecords": true,
"Files": [
{
"Name": "data_file",
"InputFile": {
"Path": "C:/RPSFilesProcessorData/SmartCoreCBS/in/data_file.csv"
},
"OutputFile": {
"Path": "C:/RPSFilesProcessorData/SmartCoreCBS/out/protected_data_file.csv"
},
"ErrorFile": {
"Path": "C:/RPSFilesProcessorData/SmartCoreCBS/errors/error_data_file.csv"
},
"Columns": [
{
"Names": [
"FirstName"
],
"ClassName": "example1.PersonalDetails",
"PropertyName": "Names",
"ProcessingTemplateName": "ProtectionTemplate"
},
{
"Names": [
"LastName"
],
"ClassName": "example1.PersonalDetails",
"PropertyName": "Names",
"ProcessingTemplateName": "ProtectionTemplate"
},
{
"Names": [
"DateOfBirth"
],
"ClassName": "example1.PersonalDetails",
"PropertyName": "DateOfBirthOrDeath",
"ProcessingTemplateName": "ProtectionTemplate"
},
{
"Names": [
"DateOfDeath"
],
"ClassName": "example1.PersonalDetails",
"PropertyName": "DateOfBirthOrDeath",
"ProcessingTemplateName": "ProtectionTemplate"
},
{
"Names": [
"PhoneNumber"
],
"ClassName": "example1.PhoneNumbers",
"PropertyName": "PhoneNumber",
"ProcessingTemplateName": "ProtectionTemplate"
},
{
"Names": [
"PostCode"
],
"ClassName": "example1.Addresses",
"PropertyName": "PostCode",
"ProcessingTemplateName": "ProtectionTemplate"
}
]
}
]
}
}
]
}