Skip to content

Handle sparse data #20

@ravwojdyla

Description

@ravwojdyla

There is a production case like this:

  case class TrainingExample(indices: List[Int],
                             data: List[Float],
                             label: Float,
                             weight: Float)

  object TestFeatureSpec {
    val featuresType: TensorFlowType[TrainingExample] = TensorFlowType[TrainingExample]
  }
...

  def convertToTrainingExample(sv: Seq[SparseVector[Float]]): TrainingExample = {
    val labelData = sv(0).data
    val label = labelData.head
    val weight = labelData.length match {
      case a if a == 2 => labelData(1)
      case _ => defaultWeight
    }
    TrainingExample(
      sv(1).index.toList,
      sv(1).data.toList,
      label,
      weight
    )
  }

...

    val features = extracted
      .featureValues[SparseVector[Float]]
      .map(sv => (sampler.getPartition(), convertToTrainingExample(sv)))
      .map { case (partition, example) =>
        (partition, TestFeatureSpec.featuresType.toExample(example))
      }
...

I guess there might be a problem with lists (indices, data), but can we handle this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions