Skip to content

JGG January book comments and suggested revisions #35

@jacob-gg

Description

@jacob-gg

Hey, all! I'm dropping some notes and possible edits here as I read through the book per Clay's 2022-01-04 email. I'm very much enjoying reading through everybody's material—and I'm learning a bunch as well!

I'm working chapter by chapter, so I'll update this issue as I make additional headway. Also: I'm including immediately below some notes about things we may want to make a decision about standardization-wise:


Standardization questions:

  • Commas following sentence-heading adverbs and adverbial phrases; e.g.:
    • E.g., “Sometimes, John says…” vs. “Sometimes John says…”; "To complete the process do x..." vs. "To complete the process, do x..."
    • I tend to vote for using a comma in these cases (admittedly, my tendencies are for insistent disambiguation)
  • Style for code comments:
    • Sentence case?
    • Trailing punctuation?
  • For offsetting material within a sentence: Do we want to use em dashes sans spaces or hyphens with spaces around them? I think we currently have a slight mix (e.g., “The car—a Prius—is red” vs. “The car - a Prius - is red”)
    • I’d personally vote for the former in order to prevent potential ambiguities arising from the hyphen also being used in code as a minus sign/etc.
  • “user base” vs. “userbase”
  • "indexes" vs. "indices"
  • "CSV" vs. "csv"
  • Spaces after commas in code? We currently vary a bit
    • E.g., some_array = [1,2,3] vs. some_array = [1, 2, 3]
  • Bolding of package names
    • Bold in all instances or just in some contexts?
  • Period after "etc"/"etc."?
  • Parentheses after function names when referenced in text? We currently vary a bit
    • pandas.DataFrame() vs. pandas.DataFrame
  • When indicating ranges, do we want to all use en dashes, hyphens, or hyphens with spaces?
    • “From 2010–2013” / “From 2010-2013” / “From 2010 - 2013”
  • Spaces in argument declarations?
    • arg = 1 vs. arg=1
  • Periods for Latin abbreviations?
    • e.g. vs eg; i.e. vs. ie
  • "dataset" vs. "data set"
  • Code formatting for in-text uses of TRUE/FALSE (R) and True/False (Python)?
  • Capitalization scheme for sub-subheadings?
    • I.e., chapters are title case, and immediate subheadings are also title case; do we want sub-subheadings (x.y.z) to also all be title case, or should those be sentence case?
  • When code chunks produce non-troubling warnings (e.g., when ggplot says "Removed N rows containing missing data..."), do we want to set warning = F in the code chunk to suppress those in the rendered book?
  • I'll update these bullets as I encounter more standardization questions

Chapter-by-chapter notes and typographic suggestions:

Format:

  • [chapter #].[subsection #]: "original/current language" —> "possible language"
    • Description of/motivation for possible edit

Chapter 1: (Notes last updated: 2022-01-12)

  • 1.2: “…preferred in most R style guides to distinguish it between assignment and setting the value…” —> “…preferred in most R style guides to distinguish between assignment and setting the value…”
    • I’d argue that since "distinguish" isn't followed by "from" here (e.g., “…to distinguish x from y…"), we should drop “it”
  • 1.2: “Python uses = for assignment while R can use…” —> “Python uses = for assignment, while R can use…”
    • I think the “whereas” interpretation of “while” (established by the addition of a comma) might be preferable here
  • 1.4 (code): “That’s how we use the install_github() below.” —> “That’s how we use install_github() below.”
  • 1.4: “...when installing package updates you will be asked ‘Do you want to…’” —> "“...when installing package updates you will be asked, ‘Do you want to…’”
    • I'd argue that a comma before the quoted material is standard here
  • 1.8: “The return statement takes an optional argument in it’s parenthesis that will…” —> “The return statement takes an optional argument in its parentheses that will…”
    • There's currently a spare apostrophe in “its”
    • I’d also suggest that since the proposition here is “in,” it'd be preferable to reference parentheses (vs. paranthesis)
  • 1.8: “…have built-in error-checking that return messages…” —> “…have built-in error-checking that returns messages…”
    • Suggesting the above given that "error-checking" is singular here
  • 1.8: “At tuple is a data structure…” —> “A tuple is a data structure…”
    • “At” —> “A”
  • 1.8 “…the three columns using an anonymous function with lapply” —> “…the three columns using an anonymous function with lapply()”
    • Adding parentheses to in-textreference_to_function()

Chapter 2: (Notes last updated: 2022-01-14)

  • 2.1: “The of the most…” —> “Three of the most…”
  • 2.1: “…be explicitly declared, they are indicated…” —> “…be explicitly declared; they are indicated…”
    • Comma splice
  • 2.1: “…for negative indexing, using an index of…” —> “…for negative indexing; using an index of…”
    • Comma splice
  • 2.1: “…declared using the numpy.array() function and the numpy package needs to…” —> “…declared using the numpy.array() function, and the numpy package needs to…”
    • Comma before coord. conj.
  • 2.1: “…cannot be carried out on lists, but can be carried out…” —> “…cannot be carried out on lists, but they can be carried out…”
    • Need post-comma pronoun to prevent fragment here (or drop the comma and proceed w/o pronoun)
  • 2.2: “…multiple vectors each of which…” —> “…multiple vectors, each of which…”
    • I’d argue that a comma before the modifier phrase would be the least ambiguous form of the sentence, but others may well feel differently; just mentioning it as a possibility

Chapter 3: (Notes last updated 2022-01-15)

  • 3.0 “The examples below highlight one way that…”
    • This sentence tripped me up a bit because it implies that we show one method across a whole set of examples, but I take it that the sentence is indicating that we show one method per example
  • 3.0: “The data we use for demonstration is…”
    • Since we treat “data” as plural throughout the book, I assume we want to here as well; i.e., “are” instead of "is" (or we could just switch to: "The data set we use...," in which case we could keep the verb as "is")
  • 3.1: “They are useful for “rectangular" data where rows represent…”
    • Currently, the clause headed by “where” is restrictive—but should it be? I.e., is there a form of rectangular data we could be referencing that wouldn’t be characterized by rows=observations/columns=variables? If not, then we may want to make that nonrestrictive by adding a comma before "where"
  • 3.2: “Since this Excel file has only one sheet we do not need…” —> “Since this Excel file has only one sheet, we do not need…”
    • Comma after adverbial dependent clause at head of sentence
  • 3.4: “Because of their flexibility XML files…” —> “Because of their flexibility, XML files…”
    • Comma after adverbial dependent clause at head of sentence

Chapter 4: (Notes last updated 2022-01-17)

  • 4.1: The Python example of mtcars.info() where verbose = False seems to currently print the table describing variables, although the text indicates that setting verbose = False “excludes the table describing each column.” (I.e., the output when verbose = False seems to currently be the same as when verbose = True
  • 4.2: “This function works on numpy array, pandas series, and pandas DataFrames” —> “This function works on numpy arrays, pandas series, and pandas DataFrames”
    • Pluralize “arrays” for consistency with “series” and “DataFrames”
  • 4.2: “Single indexing brackets work as well, but return a data frame…” —> “Single indexing brackets work as well, but they return a data frame…”
    • Clause headed by "but" is subordinate; we could either drop the comma or add a pronoun
  • 4.2: Where we have “single indexing brackets” and “double indexing brackets,” I’d argue for hyphens (e.g., “single-indexing brackets”) in order to unambiguously establish the compound modifier
  • 4.3: “Column names can be chnaged using…” —> “Column names can be changed using…”
    • Typo
  • 4.3: At the end of the Python code example, there’s a spare, unfinished sentence: “You can…”
  • 4.3: In the R code example, the text says, “We change the [column] name to ‘cylinder”; however, the code currently changes the column name to “cylinders”
  • 4.5: Just a note that there’s a remaining “to-do” listed in the actual text at the bottom of the 4.5 R code section
  • 4.9: “The base R functions sample() and runif() can be combined to sample sizes or approximate proportions”
    • This sentence tripped me up just a hair because it’s easy to read “sample sizes” as “[adjective] [noun]” (i.e., “a study’s N”) as opposed to “[verb] [noun]”; we may be able to disambiguate by using, say, “sample fixed sizes” instead of “sample sizes”

Chapter 5: (Notes last updated 2022-01-18)

  • 5.1: “To row bind data frames the column names must…” —> “To row bind data frames, the column names must…”
    • Comma after sentence-heading dependent clause
  • 5.3.1: “…and the desired names for output columns in the long data…”
    • When I wrote this, I thought “output” would be a clarifying adjective; reading it now, I think “new” would be clearer
  • 5.3.2: Comment in R code next to names_glue argument is cut off—my fault; will update
  • 5.4: Will drop unnecessary “#x” and “#y” code comments from introduction section indicating data frames used in join examples
  • 5.4: “wherever possible” might be clearer than “where possible” in left/right merge descriptions
  • 5.4.3: “…for which a match can be on the merge criterion…” —> “…for which a match can be found on the merge criterion…”
    • I omitted a word
  • 5.4.3: R code comment would probably be clearer as: “with its default arguments, merge() executes an inner join”

Chapter 6: (Notes last updated 2022-01-20)

  • 6.1: “…determining frequencies per group (or values based on…” —> “…determining frequencies per group (or determining values based on…”
    • As I reread this, I realize it would have been clearer had I repeated "determining" (as above) so as to eliminate the possibility of interpreting it as “frequencies per X (or frequencies per Y)”
  • 6.1: “The groupby(), also in pandas…” —> “Alternatively, the groupby() function, also in pandas…”
    • I omitted the word “function”
  • 6.2: Will remove parentheses from around “and the variable to be summarized” in R section on group summaries, as that materials not an aside
  • 6.2: “…are returned if no…” —> “…are returned even if no…”
  • 6.2: Probably switch order of the R code chunks showing (a) the drop = F argument and (b) formula-notation aggregation
  • 6.2: “A benefit of summarize() is that it allows a user to…” —> “summarize() makes it easy to…”
    • Will revise for the sake of concision
  • 6.3: Probably revise definition of centering to indicate that the process isn’t exclusively around 0 (e.g., centering around group means)
  • 6.3: Will remove unnecessary parentheses from around “without scaling it” and “while also centering it”

Chapter 7: (Notes last updated 2022-01-26)

  • 7.0: “For the R sections below, we discuss how to generate plots using base R and ggplot2.” —> “For the R sections below, we show how to make each plot with base R and with ggplot2.”
    • Improving clarity of the sentence I'd initially added here
  • 7.1: “The Python plotting library Matplotlib…” —> “The Python plotting library Matplotlib…”
    • At least as of now, we’ve been bolding library/package names; do we want to stick with this convention?
  • 7.1: “…show a histogram of the bill length from the dataset…” —> “…show a histogram of bill lengths from the penguins dataset…”
    • Clarity; pluralize as appropriate
  • 7.1: “We specified 30 bins each of which is light blue with a black outline of linewidth 1.” —> “We specified 30 bins, each of which is light blue with a black outline of linewidth = 1.
    • Comma after “bins” in order to establish “each of…” as an unambiguous modifier; code format for the linewidth argument
  • 7.1: “The hist() defaults to…” —> “The hist() function defaults to…”
    • Add “function”
  • 7.1: “…defaults to no outline which can…” —> “…defaults to no outline, which can…”
    • Comma in advance of “…which can…” since that clause is nonrestrictive
  • 7.1: “For example, if a particular bin spans the range of 1 to 3, the bin will include the value 1 but will exclude the value 2…” —> “For example, if a particular bin spans the range of 1 to 3, the bin will include the value 1 but will exclude the value 3…”
    • Switch “2” to “3” in description of which value is excluded at the upper bound of the histogram,
  • 7.1: In the Python code for histograms, I think it’d be useful to include a brief in-code comment explaining the no-argument plot.clf() function
  • 7.1: “Initialize a plot with ggplot(), and then add layers thereto, specifying aesthetic properties along the way.” --> “Initialize a plot with ggplot(), and then add layers thereto.”
    • I think the “specifying…” phrase I’d initially added is unnecessary and doesn’t add any clarifying value
  • 7.2: “One thing to note here is that we…”
    • This sentence currently gets broken over two lines; adding an extra break before “One thing…” would resolve this
  • 7.2: “…we generated the same bar plot containing the same information with way less effort.” —> “we generated the same bar plot as we first made with way less effort.”
    • I think that “…same bar plot containing the same information…” rings as a bit redundant in my ear (i.e., “same plot” == “same info”); perhaps we could simply the language similar to as proposed above?
  • 7.4: “Adding methhod = 'jitter’ to the set of arguments…” —> “Adding method = 'jitter’ to the arguments…”
    • Fixing my typo + simplifying language
  • 7.5: “boxplots, and a user…” —> “boxplots. A user…”
    • Improving reading rhythm

(I'll add notes for additional chapters as I make my way through them. I hope everyone's 2022 is off to a fine start!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    todoassigned work or work that needs to be done

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions