Skip to content

Old Style API

Root interface for dfschema package.

Mostly kept for consistency with earlier interfaces.

!!! note We generally recommend using a dfschema.DfSchema class instances directly, instead.

validate_df(df, schema, summary=True)

validate dataframe against the schema

validate dataframe agains the schema as a dictionary. will raise either DataFrameSummaryError (if summary=True) or DataFrameValidationError for specific problem (if summary=False)

Example
import json
import pandas as pd
import dfschema
from pathlib import Path

path = '/schema.json'
schema = json.loads(Path(path).read_text())

df = pd.DataFrame({'a':[1,2], 'b':[3,4]})

dfschema.validate_df(df, schema, summary=True)
Alternative

Equivalent to using dfschema.DfSchema class (which is recommended):

Parameters:

Name Type Description Default
df pd.DataFrame

A dataframe to validate

required
schema dict

schema as a dictionary to validate against

required
summary bool

if False, raise exception on first violation (faster), otherwise will collect all violations and raise summary exception (slower)

True
Source code in dfschema/validate.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def validate_df(df: pd.DataFrame, schema: dict, summary: bool = True) -> None:
    """validate dataframe against the schema

    validate dataframe agains the schema as a dictionary. will raise
    either DataFrameSummaryError (if summary=True) or DataFrameValidationError for specific
    problem (if summary=False)

    ### Example
    ```python
    import json
    import pandas as pd
    import dfschema
    from pathlib import Path

    path = '/schema.json'
    schema = json.loads(Path(path).read_text())

    df = pd.DataFrame({'a':[1,2], 'b':[3,4]})

    dfschema.validate_df(df, schema, summary=True)
    ```

    ### Alternative
    Equivalent to using `dfschema.DfSchema` class (which is recommended):

    Args:
        df (pd.DataFrame): A dataframe to validate
        schema (dict): schema as a dictionary to validate against
        summary (bool): if `False`, raise exception on first violation (faster), otherwise will collect all violations and raise summary exception (slower)

    """

    Schema = DfSchema.from_dict(schema)
    Schema.validate_df(df=df, summary=summary)