UnicodeChart.AsciiDigits appears to be defined too broadly

Jun 27, 2012 at 11:55 PM

In the file Source\TestApiCore\Code\Text\UnicodeRangeDatabase.cs the following line defines the range for UnicodeChart.AsciiDigits:

symbolsAndPunctuation[14] = new Group(new UnicodeRange(0x0000, 0x007F), "Numbers and Digits", "ASCII Digits", "latin", UnicodeChart.AsciiDigits);

I think this range (0x00 - 0x7F) is too broad. Looking at the chart for ASCII from unicode.org (here http://unicode.org/charts/PDF/U0000.pdf), it indicates that digits are only 0x30 - 0x39, which correlates with my experience with ASCII digits.

Jun 29, 2012 at 1:47 AM

agree. It was by designed that we used the range 0x0000 - 0x007F. UnicodeRangeDatabase initialized Scripts and Symbols and Puncatuations by following http://www.unicode.org/charts/. They were further defined as Group and SubGroup. The UnicodeRange used was the range directly from the linked pdf file (same to FullwidthAsciiDigits). We did not refine further due to the time constrain particularly some of them were not in one contiguous range (multiple ranges instead). For NumberProperty, the range was only used for validation purpose. For more precise result, properties.UnicodeRanges.Add(new UnicodeRange(0x30, 0x39)) can be used for now. We were hoping to refine them after the first release. I thought it was mentioned in the design doc.




Jun 29, 2012 at 3:45 AM

Thanks for the insight. It could very well be in the design doc, but until now I didn't even know one existed. If it was called "readme" or "release notes" I'm sure I would have read it. Sorry for the oversight.

Can you speculate as to release? The work here is surprisingly high-quality and comprehensive. I'd vote for sooner release vs. later.