Skip to content

Adding new __repr__ for pyspark StructField such that the error logs explicitly show metadata differences#77

Open
henrytomsf wants to merge 1 commit intoMrPowers:mainfrom
henrytomsf:feature/explicit-metadata-in-print
Open

Adding new __repr__ for pyspark StructField such that the error logs explicitly show metadata differences#77
henrytomsf wants to merge 1 commit intoMrPowers:mainfrom
henrytomsf:feature/explicit-metadata-in-print

Conversation

@henrytomsf
Copy link

@henrytomsf henrytomsf commented Oct 11, 2023

Description

Helps address #76 Added a new class StructFieldPrettyPrint that will allow better representation of the StructFIeld type to show the name, data type, nullability, and the metadata. Currently pyspark's __repr__ attribute (docs) only returns:

return "StructField(%s,%s,%s)" % (self.name, self.dataType,
                                          str(self.nullable).lower())

This is not ideal when users want to compare all the attributes including metadata since it won't show up in the error message.

The new __repr__ in the StructFieldPrettyPrint will override the pyspark StructField's __repr__ method with something more explicit:

return "StructField(%s, %s, %s, %s)" % (
            f"'{self.structfield.name}'",
            self.structfield.dataType,
            str(self.structfield.nullable).lower(),
            str(self.structfield.metadata)
        ) 

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this been tested?

  • Passes existing testing suite (pytest tests/)

…clearly show the name, type, nullability, and metadata.
@henrytomsf
Copy link
Author

henrytomsf commented Oct 11, 2023

I'm not sure how you want to handle the image assets that need to be changed for the documentation in the README as I assume there's some styling that we should adhere to so I left that update out.

but I see this now:

E           chispa.schema_comparer.SchemasNotEqualError: 
E           +-------------------------------------------+----------------------------------------------------------------------------+
E           |                  schema1                  |                                  schema2                                   |
E           +-------------------------------------------+----------------------------------------------------------------------------+
E           |    StructField('test_age', LongType(), True, {})    |                    StructField('test_age', LongType(), True, {})                     |
E           | StructField('test_name', StringType(), true, {}) | StructField('test_name', StringType(), true, {'description': 'test description'}) |
E           +-------------------------------------------+----------------------------------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant