Skip to content

Specs handling text direction

r12a edited this page Aug 3, 2016 · 25 revisions

Activity Streams

Notes

  • this is JSON
  • it essentially handles a message sent with metadata, rather than a list of items
  • structured objects, but some leaf objects contain multiple natural language strings
  • name property has no markup; not clear whether it can have multiple paragraphs (ie. lines)
  • summary and content properties do support HTML markup
  • includes a mechanism for localised text, *Map equivalents used for localised strings
  • an object can contain several natural language strings, which may have different base directions
  • a significant number of natural language strings are expected to be created by users typing text

Current solution proposed by Social WG:

for the name property (no markup allowed) add control codes at start and end of value for overall base direction and inline control codes for inline changes

{
  "@context": {
    "@value": "http://www.w3.org/ns/activitystreams",
    "@language": "he"
  },
  "name": "\u202Bפעילות הבינאום, W3C\u202C",
  "type": "Note",
  "summary": "<span dir="rtl">פעילות הבינאום, W3C</span>"
}

for summary and content properties, use markup with dir attributes to establish overall base direction and inline changes

{
  "@context": {
    "@value": "http://www.w3.org/ns/activitystreams",
    "@language": "he"
  },
  "name": "\u202Bפעילות הבינאום, W3C\u202C",
  "type": "Note",
  "summaryMap": {
    "he": "<span dir="rtl">פעילות הבינאום, W3C</span>",
    "en": "'<span dir="rtl">نشاط التدويل, W3C</span>' is how you say 'i18n Activity, W3C' in Arabic.",
    "ar": "<span dir="rtl">نشاط التدويل، W3C</span>"
  }
}

Problems

  • can't expect Arabic/Hebrew/Divehi/Urdu/Persian/etc users to add control characters or markup for default direction for every natural language string
  • if name has multiple lines, or summary/content have multiple paragraphs, each line/paragraph needs to be annotated with directional information
  • users are expected to use different approaches for notes vs summary/content, which is confusing (and must be correctly done, eg. no control codes before <p>, no control codes inside inline markup, etc.)
  • all the usual problems with control codes (eg. difficult to use, may not be available on keyboard, even harder to edit, etc.)
  • if direction information has been established by a user manually setting the direction of a form field, or by direction of form field being inherited from higher in page, this information should be automatically captured and applied to the strings, rather than requiring information to be added to each string by the user

Alternative suggestions

These suggestions are not proposals or recommendations, and they have not been discussed in the wider i18n WG, they are simply ideas to stimulate discussion, to see if we can find a way to meet the essential requirements of enabling necessary rendering of right-to-left scripts.

  • specify that the default is LTR
  • use one property per object to establish the base direction for RTL text
  • user only needs to revert to control codes/markup for exceptional text
  • if property value says auto, does FS analysis, which may reduce the need for user intervention even further
  • setting a property is possibly more helpful when dealing with input from HTML forms, etc, where the direction information is carried separately from the text (dirname)
{
  "@context": {
    "@value": "http://www.w3.org/ns/activitystreams",
    "@language": "he"
  },
  "direction": "rtl",
  "name": "פעילות הבינאום, W3C",
  "type": "Note",
  "summary": "פעילות הבינאום, W3C"
}
{
  "@context": {
    "@value": "http://www.w3.org/ns/activitystreams",
    "@language": "he"
    },
  "direction": "rtl",
  "nameMap": {
    "he": "פעילות הבינאום, W3C",
    "ar": "نشاط التدويل، W3C"
    "es": "\u2066Actividad de internationalización, W3C\u2069",
    "en": "\u2066'\u2067نشاط التدويل, W3C\u2069' is how you say 'i18n Activity, W3C' in Arabic.\u2069",
    }
  "type": "Note",
  "summaryMap": {
    "he": "פעילות הבינאום, W3C",
    "ar": "نشاط التدويل، W3C"
    "es": "<span dir="ltr">Actividad de internationalización, W3C<span dir="ltr">",
    "en": "<span dir="ltr">'<span dir="rtl">نشاط التدويل, W3C</span>' is how you say 'i18n Activity, W3C' in Arabic.</span>",
    }
  }

The following assumes that first strong heuristics are used to determine overall base direction (equivalent to direction: auto).

{
  "@context": {
    "@value": "http://www.w3.org/ns/activitystreams",
    "@language": "he"
  },
  "nameMap": {
    "he": "פעילות הבינאום, W3C",
    "ar": "نشاط التدويل، W3C"
    "es": "Actividad de internationalización, W3C",
    "en": "\u200E'\u2067نشاط التدويل, W3C\u2069' is how you say 'i18n Activity, W3C' in Arabic.",
  }
  "type": "Note",
  "summaryMap": {
    "he": "פעילות הבינאום, W3C",
    "ar": "نشاط التدويل، W3C"
    "es": "Actividad de internationalización, W3C",
    "en": "&lrm;'<span dir="rtl">نشاط التدويل, W3C</span>' is how you say 'i18n Activity, W3C' in Arabic.",
  }
}

##Web Annotation

###Notes

  • this is JSON
  • basic object sent as single item
  • structured objects
  • each structure has only one natural language string
  • text property allows markup
  • direction may be specified for a string body, or for a target body whose text is to be found elsewhere

###Current solution

  • textDirection property indicates base direction
  • textDirection values can be rtl,ltr,auto
{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "http://example.org/anno5",
  "type":"Annotation",
  "body": {
    "type" : "TextualBody",
    "text" : "<p>פעילות הבינאום, W3C</p><p>HTML היא שפת סימון.</p>",
    "format" : "text/html",
    "language" : "he"
    "direction" : "rtl"
  },
  "target": "http://example.org/photo1"
}