Skip to content

#6 - Hop Dataset Import#31

Open
krzysztofkozyra021 wants to merge 11 commits intomainfrom
hop-dataset-import
Open

#6 - Hop Dataset Import#31
krzysztofkozyra021 wants to merge 11 commits intomainfrom
hop-dataset-import

Conversation

@krzysztofkozyra021
Copy link
Collaborator

This should close #6

…e fields

- add alt_name, descriptors, lineage columns
- add thiols (low/medium/high)
- add agronomic: yield_min/max, maturity, wilt_disease, downy_mildew, powdery_mildew, aphid
- rename aroma_miscellaneous -> aroma_misc
- change substitutes from flat array to {brewhouse, dryhopping} JSON object
- update Hop model fillable, casts and docblock accordingly
- MapHopData: extract alt_name, descriptors, lineage, thiols, all agronomic fields
- MapHopData: store substitutes as {brewhouse, dryhopping} nested structure
- UpsertHop: add validation rules for all new fields (thiols: in:low,medium,high)
- UpsertHop: buildAttributes stores substitutes with brewhouse/dryhopping keys preserved
- ImportHopVarietyJobTest: assert aroma_misc (renamed from aroma_miscellaneous)
- ImportHopVarietyJobTest: assert substitutes as nested brewhouse/dryhopping structure
- HopFactory: add all missing fields (alt_name, descriptors, lineage, thiols, agronomic)
- HopFactory: fix substitutes to use correct nested {brewhouse, dryhopping} format
- HopFactory: use countryCode() instead of country() for realistic 2-letter codes
Storage::fake() creates real files in this directory during tests
and does not clean them up automatically after each run.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements the hop dataset import functionality requested in issue #6. It adds an Artisan command hops:import that loads hop variety data from JSON/JSON5 files stored in the local disk, processes them asynchronously via queued jobs, and stores them in the database with proper validation. The implementation includes comprehensive test coverage for both success and error scenarios.

Changes:

  • Added hops:import Artisan command to scan and dispatch import jobs for JSON/JSON5 files
  • Implemented queued job to parse, validate, and upsert individual hop varieties
  • Created actions for data mapping (JSON structure to internal format) and database upsertion
  • Extended database schema and model to support additional hop attributes (lineage, thiols, agronomic data, disease resistance)
  • Renamed Range value object to RangeOrNumber for clarity
  • Added JSON5 parser helper to support unquoted keys in data files

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
app/Console/Commands/HopsImportCommand.php New command to scan storage directory and dispatch import jobs
app/Jobs/ImportHopVarietyJob.php Queued job to process individual hop variety files with error logging
app/Actions/MapHopData.php Maps JSON structure to internal data format
app/Actions/UpsertHop.php Validates and upserts hop data with comprehensive validation rules
app/Helpers/Json5Parser.php Simple JSON5 parser to handle unquoted keys
app/Models/Hop.php Extended model with new fields and proper PHPDoc annotations
app/ValueObjects/RangeOrNumber.php Renamed from Range for better semantic clarity
database/migrations/2026_02_18_224936_create_hops_table.php Added new columns for hop attributes and renamed aroma_miscellaneous to aroma_misc
database/factories/HopFactory.php Updated factory to generate data for new fields
tests/Helpers/HopFixture.php Test fixtures providing sample JSON5 hop data
tests/Feature/Jobs/ImportHopVarietyJobTest.php Comprehensive job tests including error scenarios
tests/Feature/Console/Commands/HopsImportCommandTest.php Command tests verifying job dispatch and file filtering
tests/Feature/RangeOrNumberTest.php Updated tests for renamed class
.gitignore Added pattern to ignore test disk storage
storage/framework/testing/.gitignore Removed (replaced by root gitignore entry)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +19
$folder = $this->argument("folder");

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The folder argument is passed directly to Storage::files() without validation. While Laravel's Storage facade should handle path traversal attempts, it's a good practice to explicitly validate that the folder path doesn't contain suspicious characters like '..' to prevent potential path traversal attacks. Consider adding validation such as checking for '..' or validating against a whitelist of allowed characters.

Suggested change
$folder = $this->argument("folder");
$folder = trim((string) $this->argument("folder"));
// Basic validation to prevent path traversal and disallow suspicious characters
if (
$folder === "" ||
strpos($folder, "..") !== false ||
!preg_match('#^[A-Za-z0-9_\-/]+$#', $folder)
) {
$this->error("Invalid folder argument. Please provide a relative folder name using only letters, numbers, '/', '_', and '-' characters.");
return;
}

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +14
class Json5Parser
{
public function parse(string $content): ?array
{
$json = preg_replace('/(?<=[{,\n])\s*([a-zA-Z_]\w*)\s*:/m', '"$1":', $content);

$data = json_decode($json, true);

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular expression used for JSON5 parsing is too simplistic and may fail on valid JSON5 syntax. It only handles unquoted keys but doesn't support other JSON5 features like comments, trailing commas, or multi-line strings. Additionally, the pattern could incorrectly match keys that appear after array/object values. Consider using a dedicated JSON5 parsing library or document the specific subset of JSON5 that is supported.

Suggested change
class Json5Parser
{
public function parse(string $content): ?array
{
$json = preg_replace('/(?<=[{,\n])\s*([a-zA-Z_]\w*)\s*:/m', '"$1":', $content);
$data = json_decode($json, true);
use ColinODell\Json5\Json5Decoder;
class Json5Parser
{
public function parse(string $content): ?array
{
try {
$data = Json5Decoder::decode($content, true);
} catch (\Throwable $e) {
return null;
}

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +133
$attributes[$field] = ($min !== null && $max !== null)
? RangeOrNumber::fromRange((float)$min, (float)$max)
: null;
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation rules allow min and max values to be independently nullable, but the RangeOrNumber creation logic at lines 131-133 only creates a range when both min and max are non-null. This means if only one value is present, it will result in a null range, which may silently discard valid data. Consider adding a validation rule to ensure both min and max are present together, or handle the case where only one is provided.

Suggested change
$attributes[$field] = ($min !== null && $max !== null)
? RangeOrNumber::fromRange((float)$min, (float)$max)
: null;
if ($min === null && $max === null) {
$attributes[$field] = null;
} elseif ($min !== null && $max !== null) {
$attributes[$field] = RangeOrNumber::fromRange((float)$min, (float)$max);
} elseif ($min !== null) {
// Only a minimum provided; treat it as a single-value range.
$attributes[$field] = RangeOrNumber::fromRange((float)$min, (float)$min);
} else {
// Only a maximum provided; treat it as a single-value range.
$attributes[$field] = RangeOrNumber::fromRange((float)$max, (float)$max);
}

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +80
"bitterness" => $this->faker->optional()->randomElement(Bitterness::values()),
"aromaticity" => $this->faker->optional()->randomElement(Aromaticity::values()),
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'bitterness' and 'aromaticity' fields have been changed to use 'optional()' which means they might not be present in the factory-generated data. However, this creates inconsistency with other nullable fields in the factory. Consider whether these fields should always have values in test data for consistency, or if this change aligns with the domain logic where some hops may not have these classifications.

Suggested change
"bitterness" => $this->faker->optional()->randomElement(Bitterness::values()),
"aromaticity" => $this->faker->optional()->randomElement(Aromaticity::values()),
"bitterness" => $this->faker->randomElement(Bitterness::values()),
"aromaticity" => $this->faker->randomElement(Aromaticity::values()),

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +87
$attributes = [
"slug" => Str::slug($validated["name"]) . "-hop",
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slug generation uses Str::slug($validated["name"]) . "-hop" which always appends "-hop" to the slug. This could lead to redundant slugs like "cascade-hop-hop" if a hop's name already contains "hop". Consider either checking if the name already ends with "hop" before appending, or removing "hop" from the name before slugification.

Suggested change
$attributes = [
"slug" => Str::slug($validated["name"]) . "-hop",
$baseSlug = Str::slug($validated["name"]);
if (!Str::endsWith($baseSlug, "-hop")) {
$baseSlug .= "-hop";
}
$attributes = [
"slug" => $baseSlug,

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +106
"descriptors" => $validated["descriptors"] ?? [],
"lineage" => $validated["lineage"] ?? [],
"thiols" => $validated["thiols"] ?? null,
"aroma_citrusy" => $validated["aroma_citrusy"] ?? null,
"aroma_fruity" => $validated["aroma_fruity"] ?? null,
"aroma_floral" => $validated["aroma_floral"] ?? null,
"aroma_herbal" => $validated["aroma_herbal"] ?? null,
"aroma_spicy" => $validated["aroma_spicy"] ?? null,
"aroma_resinous" => $validated["aroma_resinous"] ?? null,
"aroma_sugarlike" => $validated["aroma_sugarlike"] ?? null,
"aroma_misc" => $validated["aroma_misc"] ?? null,
"aroma_descriptors" => $validated["aroma_descriptors"] ?? [],
"substitutes" => [
"brewhouse" => $validated["substitutes"]["brewhouse"] ?? [],
"dryhopping" => $validated["substitutes"]["dryhopping"] ?? [],
],
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buildAttributes method always sets empty arrays for descriptors, lineage, aroma_descriptors, and substitutes even when they are nullable fields. This prevents distinguishing between "no data provided" (null) and "empty list provided" ([]). Consider using null when the validated data doesn't contain these keys to maintain semantic correctness.

Copilot uses AI. Check for mistakes.
ImportHopVarietyJob::dispatch($file);
}

$this->info("All import jobs have been dispatched to queue, run `php artisan queue:work` to process them.");
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command message at line 40 assumes jobs are being queued asynchronously and tells users to run 'php artisan queue:work'. However, if the queue connection is set to 'sync', the jobs will already be processed and this message could be confusing. Consider checking the queue connection and adjusting the message accordingly, or clarifying that the message only applies when using async queues.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +25
$files = collect(Storage::disk("local")->files($folder))
->filter(fn(string $file): bool => in_array(
pathinfo($file, PATHINFO_EXTENSION),
["json", "json5"],
true,
));
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command doesn't handle the case where the specified folder doesn't exist. Storage::files() may throw an exception or return an unexpected result if the folder is missing. Consider adding a check for folder existence and providing a helpful error message to the user if the folder doesn't exist.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +10
{
public function parse(string $content): ?array
{
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON5 parser uses preg_replace with user-controlled content which could potentially be exploited if very large files are processed, leading to ReDoS (Regular Expression Denial of Service). Consider adding a file size limit check before parsing, or using a more efficient parsing approach.

Suggested change
{
public function parse(string $content): ?array
{
{
private const MAX_CONTENT_LENGTH = 1048576; // 1 MB limit to mitigate potential ReDoS
public function parse(string $content): ?array
{
if (strlen($content) > self::MAX_CONTENT_LENGTH) {
return null;
}

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +8
class Json5Parser
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it came from requiremnts? Or its some kind of warranty before fancy JSON data?


public function handle(Json5Parser $parser, MapHopData $mapper, UpsertHop $upsert): void
{
$content = Storage::disk("local")->get($this->filePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now u are fixed on local storage but maybe we can use s3 or smth like that. Let's make that more flexible. The same with uploading and savng files

use Illuminate\Filesystem\FilesystemManager;

$filePath = new FilesystemManager()->readStream($path);

Comment on lines +48 to +54
$table->integer("yield_min")->nullable();
$table->integer("yield_max")->nullable();
$table->string("maturity")->nullable();
$table->string("wilt_disease")->nullable();
$table->string("downy_mildew")->nullable();
$table->string("powdery_mildew")->nullable();
$table->string("aphid")->nullable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are on early stage of the project and its does not matter but you should create new one migration file to add new fileds to the table next time.

Comment on lines +29 to +33
Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/citra.json5");

Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/mosaic.json5");

Bus::assertNotDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/not-a-hop.txt");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/citra.json5");
Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/mosaic.json5");
Bus::assertNotDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/not-a-hop.txt");
Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/citra.json5");
Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/mosaic.json5");
Bus::assertNotDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/not-a-hop.txt");

Comment on lines +29 to +33
Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/citra.json5");

Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/mosaic.json5");

Bus::assertNotDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "hops_data/not-a-hop.txt");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints

$this->artisan("hops:import", ["folder" => "custom_folder"])
->assertExitCode(0);

Bus::assertDispatched(ImportHopVarietyJob::class, fn($job) => $job->filePath === "custom_folder/galaxy.json5");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints

Storage::fake("local");
Log::shouldReceive("warning")
->once()
->withArgs(fn(string $msg) => Str::contains($msg, "File not found"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type


Log::shouldReceive("warning")
->once()
->withArgs(fn(string $msg) => Str::contains($msg, "Failed to parse"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


Log::shouldReceive("warning")
->once()
->withArgs(fn(string $msg) => Str::contains($msg, "Validation failed"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines +21 to +36
"alpha_acid_min" => $data["ingredients"]["alphas"]["min"] ?? null,
"alpha_acid_max" => $data["ingredients"]["alphas"]["max"] ?? null,
"beta_acid_min" => $data["ingredients"]["betas"]["min"] ?? null,
"beta_acid_max" => $data["ingredients"]["betas"]["max"] ?? null,
"cohumulone_min" => $data["ingredients"]["cohumulones"]["min"] ?? null,
"cohumulone_max" => $data["ingredients"]["cohumulones"]["max"] ?? null,
"total_oil_min" => $data["ingredients"]["oils"]["min"] ?? null,
"total_oil_max" => $data["ingredients"]["oils"]["max"] ?? null,
"polyphenol_min" => $data["ingredients"]["polyphenols"]["min"] ?? null,
"polyphenol_max" => $data["ingredients"]["polyphenols"]["max"] ?? null,
"xanthohumol_min" => $data["ingredients"]["xanthohumols"]["min"] ?? null,
"xanthohumol_max" => $data["ingredients"]["xanthohumols"]["max"] ?? null,
"farnesene_min" => $data["ingredients"]["farnesenes"]["min"] ?? null,
"farnesene_max" => $data["ingredients"]["farnesenes"]["max"] ?? null,
"linalool_min" => $data["ingredients"]["linalool"]["min"] ?? null,
"linalool_max" => $data["ingredients"]["linalool"]["max"] ?? null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use your VO RangeOrNumber?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Hop Dataset Import Command

3 participants