dataProcessor.json#

dataProcessor.json configuration#

The job provides the ability to process different CSV files with relations between them and different steps (snapshots of data). The dataProcessor.json file uses contexts defines in all three previously mentioned files.

Processing CSV files#

CSV files are defined by Files arrays, each file contains a Name (for reference to columns by other files), an InputFilePath, an ErrorFilePath and an OutputFilePath. You may overwrite the Encoding of Input, Error and/or Output file. The file contains an array of Columns used to describe the columns of the file to be transformed.

Each Column contains an array of Names where you can specify a single column name or range of column names (for doing group transformations between different columns). The names should be identical to the columns' names in CSV file, they are case sensitive.

The Column also contains an array of Processing steps with two variables

  • Step: the step number of the transformation. Transformations are done step-by-step, and each step creates a snapshot of the data received as a result of the step processing (transformation). The value of any previous step may be used in following steps. For example, the protected value from Step 1 may be used in Step 2 for another field.
  • Processors: the array of predefined processors used for transformation or preprocessing values.

Processors#

Each processor has a Name and Type as required fields.

  • For "Type": "RPSEngineTransformation", the possible values of "Name" are "Default", "RandomShuffling", "DateRange", "Conditional", "StartsWithConditional" and "XmlTransform".
  • For "Type": "Preprocessing", the possible values of "Name" are "ReadAllUniqueValues" and "ParseDateTime".
  • For "Type": "Postprocessing", the possible values of "Name" are "NullValue", "DefaultBooleanValue" and "ToLowerCaseValue".

Processors may take an Options array as argument, some processors require it such as the "Conditional" processor requiring Condition If and Else to be specified or ClassName and PropertyName for processor "RPSEngineTransformation". Each processor may also have a Dependencies array. Each Dependency element contains a Name and Value. Name is used to link the dependency to RPS Engine.

Options#

The dataProcessor.json allows for multiple options definitions. Options can be defined at job level and will apply to all files transformed, or at file level and will only apply to the concerned file, note that file level options supersede job level options if defined. The tables below provide a summary of the different options that can be used, for examples of usage see the related Examples section.

Job level options#

OptionRequiredAdditional optionsTypeDefault valueDescription
CsvOptions Yes string-Gives the possibility to define CSV file related options such as Delimiter or NewLine.
EncodingNostring-Allows to change the encoding of the CSV file.
Files Yes --Used to provide all configuration for files transformation (Name, Columns and paths).
HeaderRowsCountNoint1Sets the number of header rows.
HeaderRowIndexNoint1Sets the row to use for column names.
SkipEmptyRecordsNoboolfalseIf set to true allows the Files Processor to skip empty rows without stopping the whole process.
RightsContextNostring-Sets the right context as defined in rightsContexts.json.
ProcessingContextNostring-Sets the processing context as defined in processingContexts.json.
BatchSizeByRowsNoint0Gives the number of rows to be sent by batches for the processing.
ProcessingTemplates Yes --If the processing is similar for multiple columns among documents, allows a quicker definition of the processing steps.
ExpressionSettings Yes --Allows the definition of additional null values apart from the usual empty string "".

File level options#

OptionRequiredAdditional optionsTypeDefault valueDescription
Name/NamesNostring/array-Defines the name (or names) of the file concerned by the following configuration.
HeaderRowsCountNoint1Defines the number of Header rows on the CSV file.
HeaderRowIndexNoint1Defines which row should be used as the index for Columns names.
SkipEmptyRecordsNoboolfalseUsed to ignore empty columns or fields in a CSV without throwing an error and stopping the protection of the rest of the file.
InputFile Yes --Provides the path of the input file. Can be used to define specific configuration for the input file (see associated options).
OutputFile Yes --Provides the path of the output file. Can be used to define specific configuration for the output file (see associated options).
ErrorFile Yes --Provides the path of the error file. Can be used to define specific configuration for the error file (see associated options).
FakeInputFile Yes --Allows the configuration of a fake input file to be used by the Files Processor. See associated options.
BatchSizeByRowsNoint0Gives the number of rows to be sent by batches for the processing.
Relations Yes --Declared for the file whose column will be referred to by another file's transformation.
ForeignKeyNostring-Used to refer to another file's column, is used in pair with Relations option.
EncodingNostring-Sets the encoding for the files. Is superseded by specific Input/Output/Error files configurations.
CsvOptions Yes --Gives the possibility to define CSV file related options such as Delimiter or NewLine.
RightsContextNostring-Used to define the Rights Contexts for the processing of the file.
ProcessingContextNostring-Used to define the Processing Contexts for the processing of the file.
ColumnProperties Yes --Defines an array of ColumnProperties classes.
Columns Yes --Configures the processing of each columns of the file, see associated options.

Input/Output/Error files options#

OptionRequiredAdditional optionsTypeDefault valueDescription
PathNostring-Used to provide the path of the concerned file.
CsvOptions Yes --Gives the possibility to define CSV file related options such as Delimiter or NewLine.
EncodingNostring-Sets the encoding for the files. Is superseded by specific Input/Output/Error files configurations.

Fake files options#

OptionRequiredAdditional optionsTypeDefault valueDescription
RequireTransformationNoboolfalseIf true = fake rows will be joined to the original rows and transformed by RPS Engine together with the original rows. If false = fake rows will be injected after transformations and not sent to RPS Engine.
InjectionType Yes --Defines the type of row injections.
NumberOfFakeRowsNoint0Determines the number of fake rows to generate in this file.

CsvOptions options#

OptionRequiredAdditional optionsTypeDefault valueDescription
NewLineNostring-Defines the new line symbol, for example: "\n" or "\r\n".
DelimiterNostring-Defines the delimiter between columns, by default it is: ",".

ExpressionSettings options#

OptionRequiredAdditional optionsTypeDefault valueDescription
AdditionalNullValuesNostring-By default the Null value of a cell is an empty string "", this allows to define additional Null values.

Relations options#

OptionRequiredAdditional optionsTypeDefault valueDescription
FileNostring-Provides the name of the file to use.
KeysNostring-Defines the keys for the SourceColumn (source column) and the DestColumn (destination column).
MappingsNostring dictionary-Creates the mapping between two file using a dictionary. For instance describing the relation between column PersonId in file Visits and column Id in file Person, we need to specify the mapping as: [{"PersonId","Id"}].
OrderBy Yes --In cases where mapping relates multiple values one to another, OrderBy defines the order in which to treat them. For instance with the Mappings example, multiple PersonId corresponding to multiple visits can be linked to a single Id.

RowInjection options#

OptionRequiredAdditional optionsTypeDefault valueDescription
RowInjectionTypeNoenum0Defines the type of row injections, can be Unknown (or 0), AppendToEnd or AppendRandom.

SortDirection options#

OptionRequiredAdditional optionsTypeDefault valueDescription
SortDirectionNoenum0Sets the sorting direction. Can take values Unset (or 0), ASC (ascending) or DESC (descending).

Columns options#

OptionRequiredAdditional optionsTypeDefault valueDescription
NamesNostring/array-Provides the names of the tables for the following configuration.
ClassNameNostring-Makes the relation with the Class Name from RPS Core Configuration.
PropertyNameNostring-Makes the relation with the Property Name from RPS Core Configuration.
RightsContextNostring-Makes the relation with the Rights Context from RPS Core Configuration.
ProcessingContextNostring-Makes the relation with the Processing Context from RPS Core Configuration.
Processing Yes --Used to define the processing of this column, if not defined through a Processing Template.
ProcessingTemplateNameNostring-Used to define the name of the Processing Template defined at Job Level.

ColumnProperties options#

OptionRequiredAdditional optionsTypeDefault valueDescription
NameNostring-Name of the column for which additional options are defined.
StorageNoenumMemoryDefines where the column data will be stored. Values can be Memory or MongoDb. This option is used if memory is not sufficient to store all columns data.

ProcessingStep options#

OptionRequiredAdditional optionsTypeDefault valueDescription
StepNoint0Provides the number of the step in a Processing definition.
Processors Yes --Defines the name of the processor for a given step.

ProcessingStepProcessor options#

OptionRequiredAdditional optionsTypeDefault valueDescription
NameNostring-Used to give the name of the processor to use, see the concerned section above.
TypeNostring-Defines the type of the processor to use, see the concerned section above.
DataTypeNoenumstringEnumeration of the type of data to be used by the processor. Can be String, DateTime, Long, Any...
OptionsNo--Used to define specific options to each processor.
ValueNostring-Defines the value to be transformed by the processor, by default it is the value of the CSV column but can be changed here by any other value.
DependenciesYesdictionary-Defines a key-value pair of dependencies for the transformers. For instance a date protection might need a definition of limits (max and min), "Dependencies": {"min": "1950-01-01", "max": "2000-12-31"}.

ProcessingTemplate options#

OptionRequiredAdditional optionsTypeDefault valueDescription
NameNostring-Defines a name for the processing template.
Processing Yes --Used to define the processing of this column, if not defined through a Processing Template.

Conditional Processor options#

OptionRequiredAdditional optionsTypeDefault valueDescription
If Yes --Determines the "if" condition for the processor, see examples.
Else Yes --Determines the "else" condition for the processor.

Conditions of the Conditional Processor options#

OptionRequiredAdditional optionsTypeDefault valueDescription
ConditionNostring-Condition used for "if" and "else", operators && (and) and || (or) can be used in the string to define more complex conditions.
Action Yes --Determines the action to do if the condition is met, if left empty or absent the default Right and Processing contexts defined at job level are applied, if any.

Actions for the Conditional Processor options#

OptionRequiredAdditional optionsTypeDefault valueDescription
DeleteRowNobool-Used to delete a row in the case the right condition is met.
RightsContextNostring-Sets the right context in case the condition is met.
ProcessingContextNostring-Sets the processing context in case the condition is met.
ValueNostring-Sets the value of the column to the value of another column or of a previous step: see examples.

Example of dataProcessor.json#

{
  "Jobs": [
    {
      "Name": "job_name_example1",
      "TypeName": "CsvFilesProcessorJob",
      "CronExpression": "0 0/1 * 1/1 * ? *",
      "Options": {
        "CsvOptions": {
          "Delimiter": ";",
          "DetectColumnCountChanges": true
        },
        "Encoding": "utf-8",
        "RightsContext": "Transform",
        "ProcessingContext": "Protect",
        "BatchSizeByRows": 400,
        "HeaderRowsCount": 1,
        "HeaderRowIndex": 1,
        "ProcessingTemplates": [
          {
            "Name": "ProtectionTemplate",
            "Processing": [
              {
                "Step": 1,
                "Processors": [
                  {
                    "Name": "Default",
                    "Type": "RPSEngineTransformation"
                  }
                ]
              }
            ]
          }
        ],
        "SkipEmptyRecords": true,
        "Files": [
          {
            "Name": "data_file",
            "InputFile": {
              "Path": "C:/RPSFilesProcessorData/SmartCoreCBS/in/data_file.csv"
            },
            "OutputFile": {
              "Path": "C:/RPSFilesProcessorData/SmartCoreCBS/out/protected_data_file.csv"
            },
            "ErrorFile": {
              "Path": "C:/RPSFilesProcessorData/SmartCoreCBS/errors/error_data_file.csv"
            },
            "Columns": [
              {
                "Names": [
                  "FirstName"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "Names",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "LastName"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "Names",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "DateOfBirth"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "DateOfBirthOrDeath",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "DateOfDeath"
                ],
                "ClassName": "example1.PersonalDetails",
                "PropertyName": "DateOfBirthOrDeath",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "PhoneNumber"
                ],
                "ClassName": "example1.PhoneNumbers",
                "PropertyName": "PhoneNumber",
                "ProcessingTemplateName": "ProtectionTemplate"
              },
              {
                "Names": [
                  "PostCode"
                ],
                "ClassName": "example1.Addresses",
                "PropertyName": "PostCode",
                "ProcessingTemplateName": "ProtectionTemplate"
              }
            ]
          }
        ]
      }
    }
  ]
}