To Nha Notes | Sept. 18, 2022, 1:27 p.m.

My data observability definition has not changed since I first coined it in 2019.
Data observability is an organization’s ability to fully understand the health of the data in their systems. It eliminates data downtime by applying best practices learned from DevOps to data pipeline observability.
Data observability tools use automated monitoring, alerting, and triaging to identify and evaluate data quality and discoverability issues. This leads to healthier pipelines, more productive teams, and happier customers.
The five pillars of data observability are:
With monitors as code, data engineers can configure monitors via
a YAML config file and apply those monitors easily as part of the
build process or within their CI/CD process. Here's why it's a
must-have for your data observability tool:
To support the growing demand for data democratization and
decentralized data ownership while meeting your company's strict
compliance needs, your data observability platform should:
Data observability automatically monitors across key features of your data ecosystem, including data freshness,
distribution, volume, schema, and lineage. Without the need for manual threshold setting, data observability
answers such questions as:
● When was my table last updated?
● Is my data within an accepted range?
● Is my data complete? Did 2,000 rows suddenly turn into 50?
● Who has access to our marketing tables and made changes to them?
● Where did my data break? Which tables or reports were affected
https://www.montecarlodata.com/blog-what-is-data-observability/