Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add multibyte character support #398

Merged
merged 16 commits into from
Sep 21, 2021
Merged

Conversation

alex-barstow
Copy link
Contributor

@alex-barstow alex-barstow commented Aug 31, 2021

Description

This adds multibyte character support to the 708 implementation via a new encoding option that can be passed to the CaptionStream constructor. The intent is to permit legacy multi-byte encodings for Chinese, Japanese, Korean, and others. The value of the option is used to create a TextDecoder instance (assuming the current browser supports it, only IE doesn't), so the value can be any "valid [encoding] label".

I'm waiting to commit the test I wrote until I have an open-source content sample to use

lib/m2ts/caption-stream.js Outdated Show resolved Hide resolved
lib/m2ts/caption-stream.js Outdated Show resolved Hide resolved
lib/m2ts/caption-stream.js Show resolved Hide resolved
alex-barstow and others added 5 commits September 3, 2021 12:29
Co-authored-by: Gary Katsevman <git@gkatsev.com>
Co-authored-by: Gary Katsevman <git@gkatsev.com>
});
} else {
try {
this.textDecoder_ = new TextDecoder(encoding);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to throw if the encoding was incorrect. We may want to concat the caught error onto our warning below.

Copy link
Contributor

@brandonocasey brandonocasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a test for:

  • passing the encoding around
  • Multi-byte character examples (if we can pull generic words out of our sample for use)
  • Not having a TextDecoder
  • An invalid encoding

// Use the TextDecoder if one was created for this service
if (service.textDecoder_ && !isExtended) {
if (isMultiByte) {
charCodeArray = [currentByte, nextByte];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should subarray here instead to prevent the creation of a new Uint8Array below.

// for multi-byte (can they be more than two characters?)
charCodeArray = packetData.subarray(i, i + 1)

// for single
charCodeArray = packetData.subarray(i)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

Copy link
Contributor Author

@alex-barstow alex-barstow Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandonocasey Actually, packetData isn't a typed array so we do need to create a Uint8Array here. And to address your other question, from the spec:

When a decoder encounters the P16 code, it uses the succeeding two bytes to
address characters in a 16-bit code set.

Copy link
Contributor

@brandonocasey brandonocasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending the tests pass

Comment on lines 2797 to 2825
QUnit.test('Creates TextDecoder only if valid encoding value is provided', function(assert) {
var secondCea708Stream;

cea708Stream = new m2ts.Cea708Stream({
captionServices: {
SERVICE1: {
encoding: 'euc-kr'
}
}
});

cc708Korean.forEach(cea708Stream.push, cea708Stream);
cea708Stream.flushDisplayed(4721138662, cea708Stream.services[1]);

assert.ok(cea708Stream.services[1].textDecoder_, 'TextDecoder created successfully when encoding is valid');

secondCea708Stream = new m2ts.Cea708Stream({
captionServices: {
SERVICE1: {
encoding: 'invalid'
}
}
});

cc708Korean.forEach(secondCea708Stream.push, secondCea708Stream);
secondCea708Stream.flushDisplayed(4721138662, secondCea708Stream.services[1]);

assert.notOk(secondCea708Stream.services[1].textDecoder_, 'TextDecoder not created when encoding is invalid');
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test needs to check for IE11 or if TextDecoder is present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check

Copy link
Member

@gkatsev gkatsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified locally. Can see captions displayed properly instead of random characters!

@alex-barstow alex-barstow merged commit 0849e0a into main Sep 21, 2021
@alex-barstow alex-barstow deleted the multibyte-708-support branch September 21, 2021 15:23
@alex-barstow alex-barstow restored the multibyte-708-support branch September 21, 2021 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants