dataProcessor.json#

dataProcessor.json configuration#

The job provides the ability to process different CSV files with relations between them and different steps (snapshots of data). The dataProcessor.json file uses contexts defines in all three previously mentioned files.

Processing CSV files#

CSV files are defined by Files arrays, each file contains a Name (for reference to columns by other files), an InputFilePath, an ErrorFilePath and an OutputFilePath. You may overwrite the Encoding of Input, Error and/or Output file. The file contains an array of Columns used to describe the columns of the file to be transformed.

Each Column contains an array of Names where you can specify a single column name or range of column names (for doing group transformations between different columns). The names should be identical to the columns' names in CSV file, they are case sensitive.

The Column also contains an array of Processing steps with two variables

Step: the step number of the transformation. Transformations are done step-by-step, and each step creates a snapshot of the data received as a result of the step processing (transformation). The value of any previous step may be used in following steps. For example, the protected value from Step 1 may be used in Step 2 for another field.
Processors: the array of predefined processors used for transformation or preprocessing values.

Processors#

Each processor has a Name and Type as required fields.

For "Type": "RPSEngineTransformation", the possible values of "Name" are "Default", "RandomShuffling", "DateRange", "Conditional", "StartsWithConditional" and "XmlTransform".
For "Type": "Preprocessing", the possible values of "Name" are "ReadAllUniqueValues" and "ParseDateTime".
For "Type": "Postprocessing", the possible values of "Name" are "NullValue", "DefaultBooleanValue" and "ToLowerCaseValue".

Processors may take an Options array as argument, some processors require it such as the "Conditional" processor requiring Condition If and Else to be specified or ClassName and PropertyName for processor "RPSEngineTransformation". Each processor may also have a Dependencies array. Each Dependency element contains a Name and Value. Name is used to link the dependency to RPS Engine.

Options#

The dataProcessor.json allows for multiple options definitions. Options can be defined at job level and will apply to all files transformed, or at file level and will only apply to the concerned file, note that file level options supersede job level options if defined. The tables below provide a summary of the different options that can be used, for examples of usage see the related Examples section.

Job level options#

Option	Additional options	Type	Default value	Description
CsvOptions	Yes	string	-	Gives the possibility to define CSV file related options such as Delimiter or NewLine.
Encoding	No	string	-	Allows to change the encoding of the CSV file.
Files	Yes	-	-	Used to provide all configuration for files transformation (Name, Columns and paths).
HeaderRowsCount	No	int	1	Sets the number of header rows.
HeaderRowIndex	No	int	1	Sets the row to use for column names.
SkipEmptyRecords	No	bool	false	If set to true allows the Files Processor to skip empty rows without stopping the whole process.
RightsContext	No	string	-	Sets the right context as defined in rightsContexts.json.
ProcessingContext	No	string	-	Sets the processing context as defined in processingContexts.json.
BatchSizeByRows	No	int	0	Gives the number of rows to be sent by batches for the processing.
ProcessingTemplates	Yes	-	-	If the processing is similar for multiple columns among documents, allows a quicker definition of the processing steps.
ExpressionSettings	Yes	-	-	Allows the definition of additional null values apart from the usual empty string "".

File level options#

Option	Additional options	Type	Default value	Description
Name/Names	No	string/array	-	Defines the name (or names) of the file concerned by the following configuration.
HeaderRowsCount	No	int	1	Defines the number of Header rows on the CSV file.
HeaderRowIndex	No	int	1	Defines which row should be used as the index for Columns names.
SkipEmptyRecords	No	bool	false	Used to ignore empty columns or fields in a CSV without throwing an error and stopping the protection of the rest of the file.
InputFile	Yes	-	-	Provides the path of the input file. Can be used to define specific configuration for the input file (see associated options).
OutputFile	Yes	-	-	Provides the path of the output file. Can be used to define specific configuration for the output file (see associated options).
ErrorFile	Yes	-	-	Provides the path of the error file. Can be used to define specific configuration for the error file (see associated options).
FakeInputFile	Yes	-	-	Allows the configuration of a fake input file to be used by the Files Processor. See associated options.
BatchSizeByRows	No	int	0	Gives the number of rows to be sent by batches for the processing.
Relations	Yes	-	-	Declared for the file whose column will be referred to by another file's transformation.
ForeignKey	No	string	-	Used to refer to another file's column, is used in pair with Relations option.
Encoding	No	string	-	Sets the encoding for the files. Is superseded by specific Input/Output/Error files configurations.
CsvOptions	Yes	-	-	Gives the possibility to define CSV file related options such as Delimiter or NewLine.
RightsContext	No	string	-	Used to define the Rights Contexts for the processing of the file.
ProcessingContext	No	string	-	Used to define the Processing Contexts for the processing of the file.
ColumnProperties	Yes	-	-	Defines an array of ColumnProperties classes.
Columns	Yes	-	-	Configures the processing of each columns of the file, see associated options.

Input/Output/Error files options#

Option	Additional options	Type	Default value	Description
Path	No	string	-	Used to provide the path of the concerned file.
CsvOptions	Yes	-	-	Gives the possibility to define CSV file related options such as Delimiter or NewLine.
Encoding	No	string	-	Sets the encoding for the files. Is superseded by specific Input/Output/Error files configurations.

Fake files options#

Option	Additional options	Type	Default value	Description
RequireTransformation	No	bool	false	If true = fake rows will be joined to the original rows and transformed by RPS Engine together with the original rows. If false = fake rows will be injected after transformations and not sent to RPS Engine.
InjectionType	Yes	-	-	Defines the type of row injections.
NumberOfFakeRows	No	int	0	Determines the number of fake rows to generate in this file.

CsvOptions options#

Option	Required	Additional options	Type	Default value	Description
NewLine		No	string	-	Defines the new line symbol, for example: "\n" or "\r\n".
Delimiter		No	string	-	Defines the delimiter between columns, by default it is: ",".

ExpressionSettings options#

Option	Required	Additional options	Type	Default value	Description
AdditionalNullValues		No	string	-	By default the Null value of a cell is an empty string "", this allows to define additional Null values.

Relations options#

Option	Additional options	Type	Default value	Description
File	No	string	-	Provides the name of the file to use.
Keys	No	string	-	Defines the keys for the SourceColumn (source column) and the DestColumn (destination column).
Mappings	No	string dictionary	-	Creates the mapping between two file using a dictionary. For instance describing the relation between column PersonId in file Visits and column Id in file Person, we need to specify the mapping as: [{"PersonId","Id"}].
OrderBy	Yes	-	-	In cases where mapping relates multiple values one to another, OrderBy defines the order in which to treat them. For instance with the Mappings example, multiple PersonId corresponding to multiple visits can be linked to a single Id.

RowInjection options#

Option	Required	Additional options	Type	Default value	Description
RowInjectionType		No	enum	0	Defines the type of row injections, can be Unknown (or 0), AppendToEnd or AppendRandom.

SortDirection options#

Option	Required	Additional options	Type	Default value	Description
SortDirection		No	enum	0	Sets the sorting direction. Can take values Unset (or 0), ASC (ascending) or DESC (descending).

Columns options#

Option	Additional options	Type	Default value	Description
Names	No	string/array	-	Provides the names of the tables for the following configuration.
ClassName	No	string	-	Makes the relation with the Class Name from RPS Core Configuration.
PropertyName	No	string	-	Makes the relation with the Property Name from RPS Core Configuration.
RightsContext	No	string	-	Makes the relation with the Rights Context from RPS Core Configuration.
ProcessingContext	No	string	-	Makes the relation with the Processing Context from RPS Core Configuration.
Processing	Yes	-	-	Used to define the processing of this column, if not defined through a Processing Template.
ProcessingTemplateName	No	string	-	Used to define the name of the Processing Template defined at Job Level.

ColumnProperties options#

Option	Required	Additional options	Type	Default value	Description
Name		No	string	-	Name of the column for which additional options are defined.
Storage		No	enum	Memory	Defines where the column data will be stored. Values can be Memory or MongoDb. This option is used if memory is not sufficient to store all columns data.

ProcessingStep options#

Option	Required	Additional options	Type	Default value	Description
Step		No	int	0	Provides the number of the step in a Processing definition.
Processors		Yes	-	-	Defines the name of the processor for a given step.

ProcessingStepProcessor options#

Option	Additional options	Type	Default value	Description
Name	No	string	-	Used to give the name of the processor to use, see the concerned section above.
Type	No	string	-	Defines the type of the processor to use, see the concerned section above.
DataType	No	enum	string	Enumeration of the type of data to be used by the processor. Can be String, DateTime, Long, Any...
Options	No	-	-	Used to define specific options to each processor.
Value	No	string	-	Defines the value to be transformed by the processor, by default it is the value of the CSV column but can be changed here by any other value.
Dependencies	Yes	dictionary	-	Defines a key-value pair of dependencies for the transformers. For instance a date protection might need a definition of limits (max and min), "Dependencies": {"min": "1950-01-01", "max": "2000-12-31"}.

ProcessingTemplate options#

Option	Required	Additional options	Type	Default value	Description
Name		No	string	-	Defines a name for the processing template.
Processing		Yes	-	-	Used to define the processing of this column, if not defined through a Processing Template.

Conditional Processor options#

Option	Required	Additional options	Type	Default value	Description
If		Yes	-	-	Determines the "if" condition for the processor, see examples.
Else		Yes	-	-	Determines the "else" condition for the processor.

Conditions of the Conditional Processor options#

Option	Required	Additional options	Type	Default value	Description
Condition		No	string	-	Condition used for "if" and "else", operators && (and) and \|\| (or) can be used in the string to define more complex conditions.
Action		Yes	-	-	Determines the action to do if the condition is met, if left empty or absent the default Right and Processing contexts defined at job level are applied, if any.

Actions for the Conditional Processor options#

Option	Additional options	Type	Default value	Description
DeleteRow	No	bool	-	Used to delete a row in the case the right condition is met.
RightsContext	No	string	-	Sets the right context in case the condition is met.
ProcessingContext	No	string	-	Sets the processing context in case the condition is met.
Value	No	string	-	Sets the value of the column to the value of another column or of a previous step: see examples.

Example of dataProcessor.json#

 {
  "Jobs": [
    {
      "Name": "job_name_example1",
      "TypeName": "CsvFilesProcessorJob",
      "CronExpression": "0 0/1 * 1/1 * ? *",
      "Options": {
        "CsvOptions": {
          "Delimiter": ";",
          "DetectColumnCountChanges": true
        },
        "Encoding": "utf-8",
        "RightsContext": "Transform",
        "ProcessingContext": "Protect",
        "BatchSizeByRows": 400,
        "HeaderRowsCount": 1,
        "HeaderRowIndex": 1,
        "ProcessingTemplates": [
          {
            "Name": "ProtectionTemplate",
            "Processing": [
              {
                "Step": 1,
                "Processors": [
                  {
                    "Name": "Default",
                    "Type": "RPSEngineTransformation"
                  }
                ]
              }
            ]
          }
        ],
        "SkipEmptyRecords": true,
        "Files": [
          {
            "Name": "data_file",
            "InputFile": {
              "Path": "C:/RPSFilesProcessorData/SmartCoreCBS/in/data_file.csv"
            },
            "OutputFile": {
              "Path": "C:/RPSFilesProcessorData/SmartCoreCBS/out/protected_data_file.csv"
            },
            "ErrorFile": {
              "Path": "C:/RPSFilesProcessorData/SmartCoreCBS/errors/error_data_file.csv"
            },
            "Columns": [
              {
                "Names": [
                  "FirstName"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "Names",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "LastName"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "Names",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "DateOfBirth"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "DateOfBirthOrDeath",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "DateOfDeath"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "DateOfBirthOrDeath",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "PhoneNumber"
                ],
                "ClassName": "example1.PhoneNumbers",
                "PropertyName": "PhoneNumber",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "PostCode"
                ],
                "ClassName": "example1.Addresses",
                "PropertyName": "PostCode",
                "ProcessingTemplateName": "ProtectionTemplate"
              }
            ]
          }
        ]
      }
    }
  ]
}