-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hey, all! I'm dropping some notes and possible edits here as I read through the book per Clay's 2022-01-04 email. I'm very much enjoying reading through everybody's material—and I'm learning a bunch as well!
I'm working chapter by chapter, so I'll update this issue as I make additional headway. Also: I'm including immediately below some notes about things we may want to make a decision about standardization-wise:
Standardization questions:
- Commas following sentence-heading adverbs and adverbial phrases; e.g.:
- E.g., “Sometimes, John says…” vs. “Sometimes John says…”; "To complete the process do x..." vs. "To complete the process, do x..."
- I tend to vote for using a comma in these cases (admittedly, my tendencies are for insistent disambiguation)
- Style for code comments:
- Sentence case?
- Trailing punctuation?
- For offsetting material within a sentence: Do we want to use em dashes sans spaces or hyphens with spaces around them? I think we currently have a slight mix (e.g., “The car—a Prius—is red” vs. “The car - a Prius - is red”)
- I’d personally vote for the former in order to prevent potential ambiguities arising from the hyphen also being used in code as a minus sign/etc.
- “user base” vs. “userbase”
- "indexes" vs. "indices"
- "CSV" vs. "csv"
- Spaces after commas in code? We currently vary a bit
- E.g.,
some_array = [1,2,3]vs.some_array = [1, 2, 3]
- E.g.,
- Bolding of package names
- Bold in all instances or just in some contexts?
- Period after "etc"/"etc."?
- Parentheses after function names when referenced in text? We currently vary a bit
pandas.DataFrame()vs.pandas.DataFrame
- When indicating ranges, do we want to all use en dashes, hyphens, or hyphens with spaces?
- “From 2010–2013” / “From 2010-2013” / “From 2010 - 2013”
- Spaces in argument declarations?
arg = 1vs.arg=1
- Periods for Latin abbreviations?
- e.g. vs eg; i.e. vs. ie
- "dataset" vs. "data set"
Codeformatting for in-text uses of TRUE/FALSE (R) and True/False (Python)?- Capitalization scheme for sub-subheadings?
- I.e., chapters are title case, and immediate subheadings are also title case; do we want sub-subheadings (x.y.z) to also all be title case, or should those be sentence case?
- When code chunks produce non-troubling warnings (e.g., when ggplot says "Removed N rows containing missing data..."), do we want to set
warning = Fin the code chunk to suppress those in the rendered book? - I'll update these bullets as I encounter more standardization questions
Chapter-by-chapter notes and typographic suggestions:
Format:
- [chapter #].[subsection #]: "original/current language" —> "possible language"
- Description of/motivation for possible edit
Chapter 1: (Notes last updated: 2022-01-12)
- 1.2: “…preferred in most R style guides to distinguish it between assignment and setting the value…” —> “…preferred in most R style guides to distinguish between assignment and setting the value…”
- I’d argue that since "distinguish" isn't followed by "from" here (e.g., “…to distinguish x from y…"), we should drop “it”
- 1.2: “Python uses = for assignment while R can use…” —> “Python uses = for assignment, while R can use…”
- I think the “whereas” interpretation of “while” (established by the addition of a comma) might be preferable here
- 1.4 (code): “That’s how we use the install_github() below.” —> “That’s how we use install_github() below.”
- 1.4: “...when installing package updates you will be asked ‘Do you want to…’” —> "“...when installing package updates you will be asked, ‘Do you want to…’”
- I'd argue that a comma before the quoted material is standard here
- 1.8: “The return statement takes an optional argument in it’s parenthesis that will…” —> “The return statement takes an optional argument in its parentheses that will…”
- There's currently a spare apostrophe in “its”
- I’d also suggest that since the proposition here is “in,” it'd be preferable to reference parentheses (vs. paranthesis)
- 1.8: “…have built-in error-checking that return messages…” —> “…have built-in error-checking that returns messages…”
- Suggesting the above given that "error-checking" is singular here
- 1.8: “At tuple is a data structure…” —> “A tuple is a data structure…”
- “At” —> “A”
- 1.8 “…the three columns using an anonymous function with lapply” —> “…the three columns using an anonymous function with lapply()”
- Adding parentheses to in-text
reference_to_function()
- Adding parentheses to in-text
Chapter 2: (Notes last updated: 2022-01-14)
- 2.1: “The of the most…” —> “Three of the most…”
- 2.1: “…be explicitly declared, they are indicated…” —> “…be explicitly declared; they are indicated…”
- Comma splice
- 2.1: “…for negative indexing, using an index of…” —> “…for negative indexing; using an index of…”
- Comma splice
- 2.1: “…declared using the
numpy.array()function and the numpy package needs to…” —> “…declared using thenumpy.array()function, and the numpy package needs to…”- Comma before coord. conj.
- 2.1: “…cannot be carried out on lists, but can be carried out…” —> “…cannot be carried out on lists, but they can be carried out…”
- Need post-comma pronoun to prevent fragment here (or drop the comma and proceed w/o pronoun)
- 2.2: “…multiple vectors each of which…” —> “…multiple vectors, each of which…”
- I’d argue that a comma before the modifier phrase would be the least ambiguous form of the sentence, but others may well feel differently; just mentioning it as a possibility
Chapter 3: (Notes last updated 2022-01-15)
- 3.0 “The examples below highlight one way that…”
- This sentence tripped me up a bit because it implies that we show one method across a whole set of examples, but I take it that the sentence is indicating that we show one method per example
- 3.0: “The data we use for demonstration is…”
- Since we treat “data” as plural throughout the book, I assume we want to here as well; i.e., “are” instead of "is" (or we could just switch to: "The data set we use...," in which case we could keep the verb as "is")
- 3.1: “They are useful for “rectangular" data where rows represent…”
- Currently, the clause headed by “where” is restrictive—but should it be? I.e., is there a form of rectangular data we could be referencing that wouldn’t be characterized by rows=observations/columns=variables? If not, then we may want to make that nonrestrictive by adding a comma before "where"
- 3.2: “Since this Excel file has only one sheet we do not need…” —> “Since this Excel file has only one sheet, we do not need…”
- Comma after adverbial dependent clause at head of sentence
- 3.4: “Because of their flexibility XML files…” —> “Because of their flexibility, XML files…”
- Comma after adverbial dependent clause at head of sentence
Chapter 4: (Notes last updated 2022-01-17)
- 4.1: The Python example of
mtcars.info()whereverbose = Falseseems to currently print the table describing variables, although the text indicates that settingverbose = False“excludes the table describing each column.” (I.e., the output whenverbose = Falseseems to currently be the same as whenverbose = True - 4.2: “This function works on numpy array, pandas series, and pandas DataFrames” —> “This function works on numpy arrays, pandas series, and pandas DataFrames”
- Pluralize “arrays” for consistency with “series” and “DataFrames”
- 4.2: “Single indexing brackets work as well, but return a data frame…” —> “Single indexing brackets work as well, but they return a data frame…”
- Clause headed by "but" is subordinate; we could either drop the comma or add a pronoun
- 4.2: Where we have “single indexing brackets” and “double indexing brackets,” I’d argue for hyphens (e.g., “single-indexing brackets”) in order to unambiguously establish the compound modifier
- 4.3: “Column names can be chnaged using…” —> “Column names can be changed using…”
- Typo
- 4.3: At the end of the Python code example, there’s a spare, unfinished sentence: “You can…”
- 4.3: In the R code example, the text says, “We change the [column] name to ‘cylinder”; however, the code currently changes the column name to “cylinders”
- 4.5: Just a note that there’s a remaining “to-do” listed in the actual text at the bottom of the 4.5 R code section
- 4.9: “The base R functions
sample()andrunif()can be combined to sample sizes or approximate proportions”- This sentence tripped me up just a hair because it’s easy to read “sample sizes” as “[adjective] [noun]” (i.e., “a study’s N”) as opposed to “[verb] [noun]”; we may be able to disambiguate by using, say, “sample fixed sizes” instead of “sample sizes”
Chapter 5: (Notes last updated 2022-01-18)
- 5.1: “To row bind data frames the column names must…” —> “To row bind data frames, the column names must…”
- Comma after sentence-heading dependent clause
- 5.3.1: “…and the desired names for output columns in the long data…”
- When I wrote this, I thought “output” would be a clarifying adjective; reading it now, I think “new” would be clearer
- 5.3.2: Comment in R code next to
names_glueargument is cut off—my fault; will update - 5.4: Will drop unnecessary “#x” and “#y” code comments from introduction section indicating data frames used in join examples
- 5.4: “wherever possible” might be clearer than “where possible” in left/right merge descriptions
- 5.4.3: “…for which a match can be on the merge criterion…” —> “…for which a match can be found on the merge criterion…”
- I omitted a word
- 5.4.3: R code comment would probably be clearer as: “with its default arguments,
merge()executes an inner join”
Chapter 6: (Notes last updated 2022-01-20)
- 6.1: “…determining frequencies per group (or values based on…” —> “…determining frequencies per group (or determining values based on…”
- As I reread this, I realize it would have been clearer had I repeated "determining" (as above) so as to eliminate the possibility of interpreting it as “frequencies per X (or frequencies per Y)”
- 6.1: “The
groupby(), also in pandas…” —> “Alternatively, thegroupby()function, also in pandas…”- I omitted the word “function”
- 6.2: Will remove parentheses from around “and the variable to be summarized” in R section on group summaries, as that materials not an aside
- 6.2: “…are returned if no…” —> “…are returned even if no…”
- 6.2: Probably switch order of the R code chunks showing (a) the
drop = Fargument and (b) formula-notation aggregation - 6.2: “A benefit of
summarize()is that it allows a user to…” —> “summarize()makes it easy to…”- Will revise for the sake of concision
- 6.3: Probably revise definition of centering to indicate that the process isn’t exclusively around 0 (e.g., centering around group means)
- 6.3: Will remove unnecessary parentheses from around “without scaling it” and “while also centering it”
Chapter 7: (Notes last updated 2022-01-26)
- 7.0: “For the R sections below, we discuss how to generate plots using base R and ggplot2.” —> “For the R sections below, we show how to make each plot with base R and with ggplot2.”
- Improving clarity of the sentence I'd initially added here
- 7.1: “The Python plotting library Matplotlib…” —> “The Python plotting library Matplotlib…”
- At least as of now, we’ve been bolding library/package names; do we want to stick with this convention?
- 7.1: “…show a histogram of the bill length from the dataset…” —> “…show a histogram of bill lengths from the penguins dataset…”
- Clarity; pluralize as appropriate
- 7.1: “We specified 30 bins each of which is light blue with a black outline of linewidth 1.” —> “We specified 30 bins, each of which is light blue with a black outline of
linewidth = 1.- Comma after “bins” in order to establish “each of…” as an unambiguous modifier;
codeformat for thelinewidthargument
- Comma after “bins” in order to establish “each of…” as an unambiguous modifier;
- 7.1: “The
hist()defaults to…” —> “Thehist()function defaults to…”- Add “function”
- 7.1: “…defaults to no outline which can…” —> “…defaults to no outline, which can…”
- Comma in advance of “…which can…” since that clause is nonrestrictive
- 7.1: “For example, if a particular bin spans the range of 1 to 3, the bin will include the value 1 but will exclude the value 2…” —> “For example, if a particular bin spans the range of 1 to 3, the bin will include the value 1 but will exclude the value 3…”
- Switch “2” to “3” in description of which value is excluded at the upper bound of the histogram,
- 7.1: In the Python code for histograms, I think it’d be useful to include a brief in-code comment explaining the no-argument
plot.clf()function - 7.1: “Initialize a plot with ggplot(), and then add layers thereto, specifying aesthetic properties along the way.” --> “Initialize a plot with ggplot(), and then add layers thereto.”
- I think the “specifying…” phrase I’d initially added is unnecessary and doesn’t add any clarifying value
- 7.2: “One thing to note here is that we…”
- This sentence currently gets broken over two lines; adding an extra break before “One thing…” would resolve this
- 7.2: “…we generated the same bar plot containing the same information with way less effort.” —> “we generated the same bar plot as we first made with way less effort.”
- I think that “…same bar plot containing the same information…” rings as a bit redundant in my ear (i.e., “same plot” == “same info”); perhaps we could simply the language similar to as proposed above?
- 7.4: “Adding
methhod = 'jitter’to the set of arguments…” —> “Addingmethod = 'jitter’to the arguments…”- Fixing my typo + simplifying language
- 7.5: “boxplots, and a user…” —> “boxplots. A user…”
- Improving reading rhythm
(I'll add notes for additional chapters as I make my way through them. I hope everyone's 2022 is off to a fine start!)