cc.factorie.util

namejuggler

package namejuggler

Visibility
  1. Public
  2. All

Type Members

  1. trait CanonicalPersonName extends PersonName

    allow declaring a record canonical, to ensure that there are no explicit derivations

  2. class CanonicalPersonNameWithDerivations extends CanonicalPersonName with PersonNameWithDerivations

    Derive all derivable fields solely from provided canonical fields.

  3. class InferredCanonicalPersonName extends CanonicalPersonName

    Infer any empty canonical fields, if possible, from provided derived fields.

    Infer any empty canonical fields, if possible, from provided derived fields.

    The approach is to generate full names from the derived fields, and then parse those full names back to canonical fields. On the one hand, that risks losing information. On the other hand, this is generally used upstream of a merge where the "correct" derived fields will take priority anyway. Also, this may help clean up mistagged data.

    In cases of single-value fields, nonempty explicit data overrides implicit data resulting from full-name parsing. Thus, e.g., an explicit Mr. overrides an implicit Dr. Should there be precedence rules?

    Set-valued fields are just merged.

    Note relationship with PersonName.merge(). Really we want to a) derive canonical fields only from derived fields; b) merge those with existing canonical fields.

  4. sealed class NameComponentFormat extends AnyRef

  5. case class NonemptyString(s: String) extends Ordered[NonemptyString] with Product with Serializable

  6. class OptionMergeException[T] extends Exception

  7. class OptionNonemptyString extends AnyRef

  8. trait PersonName extends AnyRef

    An attempt to represent names as a set of canonical atomic fields.

    An attempt to represent names as a set of canonical atomic fields. Being really comprehensive and accurate about this is not possible due to too many cultural variations and ambiguities. Still this should cover most of the cases we care about re authorship of journal articles.

    A name is not a fixed thing; it is a probabilistic cloud of strings, all denoting the same person. Here we don't cover the case that a person changes names completely; in that case there are two disjoint clouds of strings, so that should be modeled by allowing a Person to have multiple PersonNames.

    Here we try to model different representations of "the same name". Variations may include: omitting some components; using initials for some components; reordering; etc. The most "different" case to model is that of married names vs. maiden names. Since one or both of these may appear, but the other name components are not affected, we consider this a case of multiple surnames within one name.

    Subclasses propagate name fragments around the various representations, in an attempt to provide some reasonable value for each field.

    Here we want to take multiple name variants as input and coordinate them into a single record. For instance, if we assert that Amanda Jones and A. Jones-Archer are the same person, then we should later recognize Amanda Archer as a valid variant.

  9. case class PersonNameFormat(withPrefixes: Boolean, givenFormat: NameComponentFormat, surFormat: NameComponentFormat, inverted: Boolean, invertedSeparator: String = ",", withSuffixes: Boolean, initialTerminator: String = ".", initialSeparator: String = ".", degreeAbbreviator: String = ".", degreeSeparator: String = ", ", allCaps: Boolean = false) extends Product with Serializable

    A name format specification, for use both in formatting outputs and for forming expectations when parsing inputs.

    A name format specification, for use both in formatting outputs and for forming expectations when parsing inputs.

    withPrefixes
    givenFormat
    surFormat
    inverted
    withSuffixes
    initialTerminator
    initialSeparator
    degreeAbbreviator
    degreeSeparator

  10. trait PersonNameWithDerivations extends PersonName

  11. class RichString extends AnyRef

Value Members

  1. object AllInitials extends NameComponentFormat with Product with Serializable

  2. object AllNames extends NameComponentFormat with Product with Serializable

  3. object AllNamesCaps extends NameComponentFormat with Product with Serializable

  4. object Ambiguous extends NameComponentFormat with Product with Serializable

  5. object FirstInitial extends NameComponentFormat with Product with Serializable

  6. object InferredCanonicalPersonName

  7. object NameCliquer

  8. object NameJuggler

  9. object Omit extends NameComponentFormat with Product with Serializable

  10. object OneName extends NameComponentFormat with Product with Serializable

  11. object OneNameCaps extends NameComponentFormat with Product with Serializable

  12. object OptionUtils

    Shamelessly yoinked from edu.umass.cs.iesl.scalacommons

  13. object PersonName

  14. object PersonNameFormat extends Serializable

  15. object PersonNameParser

    This could be a crf...

    This could be a crf...

    returns

  16. object PersonNameWithDerivations

  17. object RichString

  18. object SeqUtils

    Shamelessly yoinked from edu.umass.cs.iesl.scalacommons

  19. object StringUtils

    Shamelessly yoinked from edu.umass.cs.iesl.scalacommons

Ungrouped