diff --git a/README.md b/README.md
index 0470fa7..6c78488 100644
--- a/README.md
+++ b/README.md
@@ -25,7 +25,7 @@ It can be used to:
## Quick Links
-- Please see [our latest talk from the Sillicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)
+- Please see [our latest talk from the Silicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)
- Join the [Discord Server](https://discord.gg/uVVsEAcfyF)
@@ -108,11 +108,11 @@ watcher.distances(model_1, model_2)
## PEFT / LORA models (experimental)
To analyze an PEFT / LORA fine-tuned model, specify the peft option.
- - peft = True: Forms the BA low rank matric and analyzes the delta layers, with 'lora_BA" tag in name
+ - peft = True: Forms the BA low rank metric and analyzes the delta layers, with 'lora_BA" tag in name
```details = watcher.analyze(peft='peft_only')```
- - peft = 'with_base': Analyes the base_model, the delta, and the combined layer weight matrices.
+ - peft = 'with_base': Analyze the base_model, the delta, and the combined layer weight matrices.
```details = watcher.analyze(peft=True)```
@@ -150,7 +150,7 @@ The goal of the WeightWatcher project is find generalization metrics that most a
-[Our HTSR theory](https://jmlr.org/papers/volume22/20-410/20-410.pdf) says that well trained, well correlated layers should be signficantly different from the MP (Marchenko-Pastur) random bulk, and specifically to be heavy tailed. There are different layer metrics in WeightWatcher for this, including:
+[Our HTSR theory](https://jmlr.org/papers/volume22/20-410/20-410.pdf) says that well trained, well correlated layers should be significantly different from the MP (Marchenko-Pastur) random bulk, and specifically to be heavy tailed. There are different layer metrics in WeightWatcher for this, including:
- `rand_distance` : the distance in distribution from the randomized layer
- `alpha` : the slope of the tail of the ESD, on a log-log scale
@@ -191,7 +191,7 @@ All of these attempt to measure how on-random and/or non-heavy-tailed the layer
#### Direct Correlation Metrics
-The random distance metric is a new, non-parameteric approach that appears to work well in early testing.
+The random distance metric is a new, non-parametric approach that appears to work well in early testing.
[See this recent blog post](https://calculatedcontent.com/2021/10/17/fantastic-measures-of-generalization-that-actually-work-part-1/)
- `rand_distance` :
Distance of layer ESD from the ideal RMT MP ESD
@@ -225,9 +225,9 @@ summary = watcher.get_summary()
The summary statistics can be used to gauge the test error of a series of pre/trained models, without needing access to training or test data.
-- average `alpha` can be used to compare one or more DNN models with different hyperparemeter settings **θ**, when depth is not a driving factor (i.e transformer models)
+- average `alpha` can be used to compare one or more DNN models with different hyperparameter settings **θ**, when depth is not a driving factor (i.e transformer models)
- average `log_spectral_norm` is useful to compare models of different depths **L** at a coarse grain level
-- average `alpha_weighted` and `log_alpha_norm` are suitable for DNNs of differing hyperparemeters **θ** and depths **L** simultaneously. (i.e CV models like VGG and ResNet)
+- average `alpha_weighted` and `log_alpha_norm` are suitable for DNNs of differing hyperparameters **θ** and depths **L** simultaneously. (i.e CV models like VGG and ResNet)
#### Predicting the Generalization Error
@@ -268,9 +268,9 @@ details = watcher.analyze(randomize=True, plot=True)
Fig (a) is well trained; Fig (b) may be over-fit.
-That orange spike on the far right is the tell-tale clue; it's caled a **Correlation Trap**.
+That orange spike on the far right is the tell-tale clue; it's called a **Correlation Trap**.
-A **Correlation Trap** is characterized by Fig (b); here the actual (green) and random (red) ESDs look almost identical, except for a small shelf of correlation (just right of 0). And random (red) ESD, the largest eigenvalue (orange) is far to the right of and seperated from the bulk of the ESD.
+A **Correlation Trap** is characterized by Fig (b); here the actual (green) and random (red) ESDs look almost identical, except for a small shelf of correlation (just right of 0). And random (red) ESD, the largest eigenvalue (orange) is far to the right of and separated from the bulk of the ESD.

@@ -281,7 +281,7 @@ Moreover, the metric `num_rand_spikes` (in the `details` dataframe) contains the
The `SVDSharpness` transform can be used to remove Correlation Traps during training (after each epoch) or after training using
```python
-sharpemed_model = watcher.SVDSharpness(model=...)
+sharpened_model = watcher.SVDSharpness(model=...)
```
Sharpening a model is similar to clipping the layer weight matrices, but uses Random Matrix Theory to do this in a more principle way than simple clipping.
@@ -294,7 +294,7 @@ Sharpening a model is similar to clipping the layer weight matrices, but uses Ra
Note: This is experimental but we have seen some success here
-The WeightWatcher `alpha` metric may be used to detect when to apply early stopping. When the average `alpha` (summary statistic) drops below `2.0`, this indicates that the model may be over-trained and early stopping is necesary.
+The WeightWatcher `alpha` metric may be used to detect when to apply early stopping. When the average `alpha` (summary statistic) drops below `2.0`, this indicates that the model may be over-trained and early stopping is necessary.
Below is an example of this, showing training loss and test lost curves for a small Transformer model, trained from scratch, along with the average `alpha` summary statistic.
@@ -356,7 +356,7 @@ Setting max is useful for a quick debugging.
details = watcher.analyze(min_evals=50, max_evals=500)
```
-#### specify the Power Law fitting proceedure
+#### specify the Power Law fitting procedure
To replicate results using TPL or E_TPL fits, use:
@@ -394,7 +394,7 @@ ww.layer#.esd4.png
**Note:** additional plots will be saved when `randomize` option is used
-#### fit ESDs to a Marchenko-Pastur (MP) distrbution
+#### fit ESDs to a Marchenko-Pastur (MP) distribution
The `mp_fit` option tells WW to fit each layer ESD as a Random Matrix as a Marchenko-Pastur (MP) distribution, as described in our papers on HT-SR.
@@ -435,15 +435,15 @@ The new distances method reports the distances between two models, such as the n
details = watcher.distances(initial_model, trained_model)
```
-### Compatability
+### Compatibility
---
-#### compatability with version 0.2.x
+#### compatibility with version 0.2.x
The new 0.4.x version of WeightWatcher treats each layer as a single, unified set of eigenvalues.
In contrast, the 0.2.x versions split the Conv2D layers into n slices, one for each receptive field.
-The `pool=False` option provides results which are back-compatable with the 0.2.x version of WeightWatcher,
+The `pool=False` option provides results which are back-compatible with the 0.2.x version of WeightWatcher,
(which used to be called `ww2x=True`) with details provide for each slice for each layer.
Otherwise, the eigenvalues from each slice of th3 Conv2D layer are pooled into one ESD.
@@ -476,7 +476,7 @@ Note: the current version requires both tensorflow and torch; if there is deman
-On using WeighWtatcher for the first time. I recommend selecting at least one trained model, and running `weightwatcher` with all analyze options enabled, including the plots. From this, look for:
+On using WeightWatcher for the first time. I recommend selecting at least one trained model, and running `weightwatcher` with all analyze options enabled, including the plots. From this, look for:
- if the layers ESDs are well formed and heavy tailed
@@ -503,7 +503,7 @@ Publishing to the PyPI repository:
```sh
# 1. Check in the latest code with the correct revision number (__version__ in __init__.py)
-vi weightwatcher/__init__.py # Increse release number, remove -dev to revision number
+vi weightwatcher/__init__.py # Increase release number, remove -dev to revision number
git commit
# 2. Check out latest version from the repo in a fresh directory
cd ~/temp/
@@ -600,7 +600,7 @@ and has been presented at Stanford, UC Berkeley, KDD, etc:
WeightWatcher has also been featured at local meetups and many popular podcasts
-#### Popular Popdcasts and Blogs
+#### Popular Podcasts and Blogs
- [This Week in ML](https://twimlai.com/meetups/implicit-self-regularization-in-deep-neural-networks/)
@@ -614,7 +614,7 @@ WeightWatcher has also been featured at local meetups and many popular podcasts
- [LightOn AI Meetup](https://www.youtube.com/watch?v=tciq7t3rj98)
-- [The Sillicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)
+- [The Silicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)
- [Applied AI Community](https://www.youtube.com/watch?v=xLZOf2IDLkc&feature=youtu.be)
diff --git a/weightwatcher/weightwatcher.py b/weightwatcher/weightwatcher.py
index 34aa497..c514113 100644
--- a/weightwatcher/weightwatcher.py
+++ b/weightwatcher/weightwatcher.py
@@ -2884,7 +2884,7 @@ def apply_FFT(self, ww_layer, params=None):
layer_id = ww_layer.layer_id
name = ww_layer.name
- if not ww_layer.skippe:
+ if not ww_layer.skipped:
logger.info("applying 2D FFT on to {} {} ".format(layer_id, name))
Wmats = ww_layer.Wmats