execution.checkpointing.aligned-checkpoint-timeout |
0 ms |
Duration |
Only relevant if execution.checkpointing.unaligned.enabled is enabled.
If timeout is 0, checkpoints will always start unaligned.
If timeout has a positive value, checkpoints will start aligned. If during checkpointing, checkpoint start delay exceeds this timeout, alignment will timeout and checkpoint barrier will start working as unaligned checkpoint. |
execution.checkpointing.checkpoints-after-tasks-finish |
true |
Boolean |
Feature toggle for enabling checkpointing even if some of tasks have finished. Before you enable it, please take a look at the important considerations |
execution.checkpointing.cleaner.parallel-mode |
true |
Boolean |
Option whether to discard a checkpoint's states in parallel using the ExecutorService passed into the cleaner |
execution.checkpointing.create-subdir |
true |
Boolean |
Whether to create sub-directories named by job id under the 'execution.checkpointing.dir ' to store the data files and meta data of checkpoints. The default value is true to enable user could run several jobs with the same checkpoint directory at the same time. If this value is set to false, pay attention not to run several jobs with the same directory simultaneously. WARNING: This is an advanced configuration. If set to false, users must ensure that no multiple jobs are run with the same checkpoint directory, and that no files exist other than those necessary for the restoration of the current job when starting a new job. |
execution.checkpointing.data-inline-threshold |
20 kb |
MemorySize |
The minimum size of state data files. All state chunks smaller than that are stored inline in the root checkpoint metadata file. The max memory threshold for this configuration is 1MB. |
execution.checkpointing.dir |
(none) |
String |
The default directory used for storing the data files and meta data of checkpoints in a Flink supported filesystem. The storage path must be accessible from all participating processes/nodes(i.e. all TaskManagers and JobManagers). If the 'execution.checkpointing.storage' is set to 'jobmanager', only the meta data of checkpoints will be stored in this directory. |
execution.checkpointing.externalized-checkpoint-retention |
NO_EXTERNALIZED_CHECKPOINTS |
Enum |
Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the owning job fails or is suspended (terminating with job status JobStatus#FAILED or JobStatus#SUSPENDED ). In this case, you have to manually clean up the checkpoint state, both the meta data and actual program state.
The mode defines how an externalized checkpoint should be cleaned up on job cancellation. If you choose to retain externalized checkpoints on cancellation you have to handle checkpoint clean up manually when you cancel the job as well (terminating with job status JobStatus#CANCELED ).
The target directory for externalized checkpoints is configured via execution.checkpointing.dir .
Possible values:- "DELETE_ON_CANCELLATION": Checkpoint state is only kept when the owning job fails. It is deleted if the job is cancelled.
- "RETAIN_ON_CANCELLATION": Checkpoint state is kept when the owning job is cancelled or fails.
- "NO_EXTERNALIZED_CHECKPOINTS": Externalized checkpoints are disabled.
|
execution.checkpointing.file-merging.across-checkpoint-boundary |
false |
Boolean |
Only relevant if execution.checkpointing.file-merging.enabled is enabled. Whether to allow merging data of multiple checkpoints into one physical file. If this option is set to false, only merge files within checkpoint boundaries. Otherwise, it is possible for the logical files of different checkpoints to share the same physical file. |
execution.checkpointing.file-merging.enabled |
false |
Boolean |
Whether to enable merging multiple checkpoint files into one, which will greatly reduce the number of small checkpoint files. This is an experimental feature under evaluation, make sure you're aware of the possible effects of enabling it. |
execution.checkpointing.file-merging.max-file-size |
32 mb |
MemorySize |
Max size of a physical file for merged checkpoints. |
execution.checkpointing.file-merging.max-space-amplification |
2.0 |
Float |
Space amplification stands for the magnification of the occupied space compared to the amount of valid data. The more space amplification is, the more waste of space will be. This configs a space amplification above which a re-uploading for physical files will be triggered to reclaim space. Any value below 1f means disabling the space control. |
execution.checkpointing.file-merging.pool-blocking |
false |
Boolean |
Whether to use Blocking or Non-Blocking pool for merging physical files. A Non-Blocking pool will always provide usable physical file without blocking. It may create many physical files if poll file frequently. When poll a small file from a Blocking pool, it may be blocked until the file is returned. |
execution.checkpointing.incremental |
false |
Boolean |
Option whether to create incremental checkpoints, if possible. For an incremental checkpoint, only a diff from the previous checkpoint is stored, rather than the complete checkpoint state. Once enabled, the state size shown in web UI or fetched from rest API only represents the delta checkpoint size instead of full checkpoint size. Some state backends may not support incremental checkpoints and ignore this option. |
execution.checkpointing.interval |
(none) |
Duration |
Gets the interval in which checkpoints are periodically scheduled.
This setting defines the base interval. Checkpoint triggering may be delayed by the settings execution.checkpointing.max-concurrent-checkpoints , execution.checkpointing.min-pause and execution.checkpointing.interval-during-backlog |
execution.checkpointing.interval-during-backlog |
(none) |
Duration |
If it is not null and any source reports isProcessingBacklog=true, it is the interval in which checkpoints are periodically scheduled.
Checkpoint triggering may be delayed by the settings execution.checkpointing.max-concurrent-checkpoints and execution.checkpointing.min-pause .
Note: if it is not null, the value must either be 0, which means the checkpoint is disabled during backlog, or be larger than or equal to execution.checkpointing.interval. |
execution.checkpointing.local-backup.dirs |
(none) |
String |
The config parameter defining the root directories for storing file-based state for local recovery. Local recovery currently only covers keyed state backends. If not configured it will default to <WORKING_DIR>/localState. The <WORKING_DIR> can be configured via process.taskmanager.working-dir |
execution.checkpointing.local-backup.enabled |
false |
Boolean |
This option configures local backup for the state backend, which indicates whether to make backup checkpoint on local disk. If not configured, fallback to execution.state-recovery.from-local. By default, local backup is deactivated. Local backup currently only covers keyed state backends (including both the EmbeddedRocksDBStateBackend and the HashMapStateBackend). |
execution.checkpointing.max-concurrent-checkpoints |
1 |
Integer |
The maximum number of checkpoint attempts that may be in progress at the same time. If this value is n, then no checkpoints will be triggered while n checkpoint attempts are currently in flight. For the next checkpoint to be triggered, one checkpoint attempt would need to finish or expire. |
execution.checkpointing.min-pause |
0 ms |
Duration |
The minimal pause between checkpointing attempts. This setting defines how soon thecheckpoint coordinator may trigger another checkpoint after it becomes possible to triggeranother checkpoint with respect to the maximum number of concurrent checkpoints(see execution.checkpointing.max-concurrent-checkpoints ).
If the maximum number of concurrent checkpoints is set to one, this setting makes effectively sure that a minimum amount of time passes where no checkpoint is in progress at all. |
execution.checkpointing.mode |
EXACTLY_ONCE |
Enum |
The checkpointing mode (exactly-once vs. at-least-once).
Possible values:- "EXACTLY_ONCE"
- "AT_LEAST_ONCE"
|
execution.checkpointing.num-retained |
1 |
Integer |
The maximum number of completed checkpoints to retain. |
execution.checkpointing.savepoint-dir |
(none) |
String |
The default directory for savepoints. Used by the state backends that write savepoints to file systems (HashMapStateBackend, EmbeddedRocksDBStateBackend). |
execution.checkpointing.storage |
(none) |
String |
The checkpoint storage implementation to be used to checkpoint state. The implementation can be specified either via their shortcut name, or via the class name of a CheckpointStorageFactory . If a factory is specified it is instantiated via its zero argument constructor and its CheckpointStorageFactory#createFromConfig(ReadableConfig, ClassLoader) method is called. Recognized shortcut names are 'jobmanager' and 'filesystem'. 'execution.checkpointing.storage' and 'execution.checkpointing.dir' are usually combined to configure the checkpoint location. By default, the checkpoint meta data and actual program state will be stored in the JobManager's memory directly. When 'execution.checkpointing.storage' is set to 'jobmanager', if 'execution.checkpointing.dir' is configured, the meta data of checkpoints will be persisted to the path specified by 'execution.checkpointing.dir'. Otherwise, the meta data will be stored in the JobManager's memory. When 'execution.checkpointing.storage' is set to 'filesystem', a valid path must be configured to 'execution.checkpointing.dir', and the checkpoint meta data and actual program state will both be persisted to the path. |
execution.checkpointing.timeout |
10 min |
Duration |
The maximum time that a checkpoint may take before being discarded. |
execution.checkpointing.tolerable-failed-checkpoints |
0 |
Integer |
The tolerable checkpoint consecutive failure number. If set to 0, that means we do not tolerance any checkpoint failure. This only applies to the following failure reasons: IOException on the Job Manager, failures in the async phase on the Task Managers and checkpoint expiration due to a timeout. Failures originating from the sync phase on the Task Managers are always forcing failover of an affected task. Other types of checkpoint failures (such as checkpoint being subsumed) are being ignored. |
execution.checkpointing.unaligned.enabled |
false |
Boolean |
Enables unaligned checkpoints, which greatly reduce checkpointing times under backpressure.
Unaligned checkpoints contain data stored in buffers as part of the checkpoint state, which allows checkpoint barriers to overtake these buffers. Thus, the checkpoint duration becomes independent of the current throughput as checkpoint barriers are effectively not embedded into the stream of data anymore.
Unaligned checkpoints can only be enabled if execution.checkpointing.mode is EXACTLY_ONCE and if execution.checkpointing.max-concurrent-checkpoints is 1 |
execution.checkpointing.unaligned.forced |
false |
Boolean |
Forces unaligned checkpoints, particularly allowing them for iterative jobs. |
execution.checkpointing.unaligned.interruptible-timers.enabled |
false |
Boolean |
Allows unaligned checkpoints to skip timers that are currently being fired. For this feature to be enabled, it must be also supported by the operator. Currently this is supported by all TableStreamOperators and CepOperator. |
execution.checkpointing.unaligned.max-subtasks-per-channel-state-file |
5 |
Integer |
Defines the maximum number of subtasks that share the same channel state file. It can reduce the number of small files when enable unaligned checkpoint. Each subtask will create a new channel state file when this is configured to 1. |
execution.checkpointing.write-buffer-size |
4096 |
Integer |
The default size of the write buffer for the checkpoint streams that write to file systems. The actual write buffer size is determined to be the maximum of the value of this option and option 'execution.checkpointing.data-inline-threshold'. |