1 | #!/usr/bin/perl -w␊ |
2 | ␊ |
3 | # Po4a::Sgml.pm␊ |
4 | #␊ |
5 | # extract and translate translatable strings from an sgml based document.␊ |
6 | #␊ |
7 | # This code is an adapted version of sgmlspl (SGML postprocessor for the␊ |
8 | # SGMLS and NSGMLS parsers) which was:␊ |
9 | #␊ |
10 | # Copyright (c) 1995 by David Megginson <dmeggins@aix1.uottawa.ca>␊ |
11 | #␊ |
12 | # The adaptation for po4a was done by Denis Barbier <barbier@linuxfr.org>,␊ |
13 | # Martin Quinson (mquinson#debian.org) and others.␊ |
14 | #␊ |
15 | # This program is free software; you can redistribute it and/or modify␊ |
16 | # it under the terms of the GNU General Public License as published by␊ |
17 | # the Free Software Foundation; either version 2 of the License, or␊ |
18 | # (at your option) any later version.␊ |
19 | #␊ |
20 | # This program is distributed in the hope that it will be useful,␊ |
21 | # but WITHOUT ANY WARRANTY; without even the implied warranty of␊ |
22 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the␊ |
23 | # GNU General Public License for more details.␊ |
24 | #␊ |
25 | # You should have received a copy of the GNU General Public License␊ |
26 | # along with this program; if not, write to the Free Software␊ |
27 | # Foundation, Inc.,␊ |
28 | # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA␊ |
29 | #␊ |
30 | ########################################################################␊ |
31 | ␊ |
32 | =encoding UTF-8␊ |
33 | ␊ |
34 | =head1 NAME␊ |
35 | ␊ |
36 | Locale::Po4a::Sgml - convert SGML documents from/to PO files␊ |
37 | ␊ |
38 | =head1 DESCRIPTION␊ |
39 | ␊ |
40 | The po4a (PO for anything) project goal is to ease translations (and more␊ |
41 | interestingly, the maintenance of translations) using gettext tools on␊ |
42 | areas where they were not expected like documentation.␊ |
43 | ␊ |
44 | Locale::Po4a::Sgml is a module to help the translation of documentation in␊ |
45 | the SGML format into other [human] languages.␊ |
46 | ␊ |
47 | This module uses B<nsgmls> to parse the SGML files. Make sure it is␊ |
48 | installed.␊ |
49 | Also make sure that the DTD of the SGML files are installed in the system.␊ |
50 | ␊ |
51 | =head1 OPTIONS ACCEPTED BY THIS MODULE␊ |
52 | ␊ |
53 | =over 4␊ |
54 | ␊ |
55 | =item B<debug>␊ |
56 | ␊ |
57 | Space separated list of keywords indicating which part you want to debug. Possible values are: tag, generic, entities and refs.␊ |
58 | ␊ |
59 | =item B<verbose>␊ |
60 | ␊ |
61 | Give more information about what's going on.␊ |
62 | ␊ |
63 | =item B<translate>␊ |
64 | ␊ |
65 | Space separated list of extra tags (beside the DTD provided ones) whose␊ |
66 | content should form an extra msgid.␊ |
67 | ␊ |
68 | =item B<section>␊ |
69 | ␊ |
70 | Space separated list of extra tags (beside the DTD provided ones)␊ |
71 | containing other tags, some of them being of category B<translate>.␊ |
72 | ␊ |
73 | =item B<indent>␊ |
74 | ␊ |
75 | Space separated list of tags which increase the indentation level.␊ |
76 | ␊ |
77 | =item B<verbatim>␊ |
78 | ␊ |
79 | The layout within those tags should not be changed. The paragraph won't get␊ |
80 | wrapped, and no extra indentation space or new line will be added for␊ |
81 | cosmetic purpose.␊ |
82 | ␊ |
83 | =item B<empty>␊ |
84 | ␊ |
85 | Tags not needing to be closed.␊ |
86 | ␊ |
87 | =item B<ignore>␊ |
88 | ␊ |
89 | Tags ignored and considered as plain char data by po4a. That is to say that␊ |
90 | they can be part of an msgid. For example, E<lt>bE<gt> is a good candidate␊ |
91 | for this category since putting it in the translate section would create␊ |
92 | msgids not being whole sentences, which is bad.␊ |
93 | ␊ |
94 | =item B<attributes>␊ |
95 | ␊ |
96 | A space separated list of attributes that need to be translated. You can␊ |
97 | specify the attributes by their name (for example, "lang"), but you can also␊ |
98 | prefix it with a tag hierarchy, to specify that this attribute will only be␊ |
99 | translated when it is into the specified tag. For example:␊ |
100 | E<lt>bbbE<gt>E<lt>aaaE<gt>lang specifies that the lang attribute will only be␊ |
101 | translated if it is in an E<lt>aaaE<gt> tag, which is in a E<lt>bbbE<gt> tag.␊ |
102 | The tag names are actually regular expressions so you can also write things␊ |
103 | like E<lt>aaa|bbbbE<gt>lang to only translate lang attributes that are in␊ |
104 | an E<lt>aaaE<gt> or a E<lt>bbbE<gt> tag.␊ |
105 | ␊ |
106 | =item B<qualify>␊ |
107 | ␊ |
108 | A space separated list of attributes for which the translation must be␊ |
109 | qualified by the attribute name. Note that this setting automatically adds the␊ |
110 | given attribute into the 'attributes' list too.␊ |
111 | ␊ |
112 | =item B<force>␊ |
113 | ␊ |
114 | Proceed even if the DTD is unknown or if nsgmls finds errors in the input␊ |
115 | file.␊ |
116 | ␊ |
117 | =item B<include-all>␊ |
118 | ␊ |
119 | By default, msgids containing only one entity (like '&version;') are skipped␊ |
120 | for the translator comfort. Activating this option prevents this␊ |
121 | optimisation. It can be useful if the document contains a construction like␊ |
122 | "<title>Á</title>", even if I doubt such things to ever happen...␊ |
123 | ␊ |
124 | =item B<ignore-inclusion>␊ |
125 | ␊ |
126 | Space separated list of entities that won't be inlined.␊ |
127 | Use this option with caution: it may cause nsgmls (used internally) to add␊ |
128 | tags and render the output document invalid.␊ |
129 | ␊ |
130 | =back␊ |
131 | ␊ |
132 | =head1 STATUS OF THIS MODULE␊ |
133 | ␊ |
134 | The result is perfect. I.e., the generated documents are exactly the␊ |
135 | same. But there are still some problems:␊ |
136 | ␊ |
137 | =over 2␊ |
138 | ␊ |
139 | =item *␊ |
140 | ␊ |
141 | The error output of nsgmls is redirected to /dev/null, which is clearly␊ |
142 | bad. I don't know how to avoid that.␊ |
143 | ␊ |
144 | The problem is that I have to "protect" the conditional inclusions (i.e. the␊ |
145 | C<E<lt>! [ %foo [> and C<]]E<gt>> stuff) from nsgmls. Otherwise␊ |
146 | nsgmls eats them, and I don't know how to restore them in the final␊ |
147 | document. To prevent that, I rewrite them to C<{PO4A-beg-foo}> and␊ |
148 | C<{PO4A-end}>.␊ |
149 | ␊ |
150 | The problem with this is that the C<{PO4A-end}> and such I add are valid in␊ |
151 | the document (not in a E<lt>pE<gt> tag or so).␊ |
152 | ␊ |
153 | Everything works well with nsgmls's output redirected that way, but it will␊ |
154 | prevent us from detecting that the document is badly formatted.␊ |
155 | ␊ |
156 | =item *␊ |
157 | ␊ |
158 | It does work only with the DebianDoc and DocBook DTD. Adding support for a␊ |
159 | new DTD should be very easy. The mechanism is the same for every DTD, you just␊ |
160 | have to give a list of the existing tags and some of their characteristics.␊ |
161 | ␊ |
162 | I agree, this needs some more documentation, but it is still considered as␊ |
163 | beta, and I hate to document stuff which may/will change.␊ |
164 | ␊ |
165 | =item *␊ |
166 | ␊ |
167 | Warning, support for DTDs is quite experimental. I did not read any␊ |
168 | reference manual to find the definition of every tag. I did add tag␊ |
169 | definition to the module 'till it works for some documents I found on the␊ |
170 | net. If your document use more tags than mine, it won't work. But as I said␊ |
171 | above, fixing that should be quite easy.␊ |
172 | ␊ |
173 | I did test DocBook against the SAG (System Administrator Guide) only, but␊ |
174 | this document is quite big, and should use most of the DocBook␊ |
175 | specificities.␊ |
176 | ␊ |
177 | For DebianDoc, I tested some of the manuals from the DDP, but not all yet.␊ |
178 | ␊ |
179 | =item *␊ |
180 | ␊ |
181 | In case of file inclusion, string reference of messages in PO files␊ |
182 | (i.e. lines like C<#: en/titletoc.sgml:9460>) will be wrong.␊ |
183 | ␊ |
184 | This is because I preprocess the file to protect the conditional inclusion␊ |
185 | (i.e. the C<E<lt>! [ %foo [> and C<]]E<gt>> stuff) and some entities (like␊ |
186 | &version;) from nsgmls because I want them verbatim to the generated␊ |
187 | document. For that, I make a temp copy of the input file and do all the␊ |
188 | changes I want to this before passing it to nsgmls for parsing.␊ |
189 | ␊ |
190 | So that it works, I replace the entities asking for a file inclusion by the␊ |
191 | content of the given file (so that I can protect what needs to be in a subfile␊ |
192 | also). But nothing is done so far to correct the references (i.e., filename␊ |
193 | and line number) afterward. I'm not sure what the best thing to do is.␊ |
194 | ␊ |
195 | =back␊ |
196 | ␊ |
197 | =cut␊ |
198 | ␊ |
199 | package Locale::Po4a::Sgml;␊ |
200 | ␊ |
201 | use 5.006;␊ |
202 | use strict;␊ |
203 | use warnings;␊ |
204 | ␊ |
205 | ␊ |
206 | require Exporter;␊ |
207 | use vars qw(@ISA @EXPORT);␊ |
208 | @ISA = qw(Locale::Po4a::TransTractor);␊ |
209 | @EXPORT = qw();␊ |
210 | ␊ |
211 | use Locale::Po4a::TransTractor;␊ |
212 | use Locale::Po4a::Common;␊ |
213 | ␊ |
214 | eval qq{use SGMLS};␊ |
215 | if ($@) {␊ |
216 | die wrap_mod("po4a::sgml", dgettext("po4a","The needed module SGMLS.pm was not found and needs to be installed. It can be found on the CPAN, in package libsgmls-perl on debian, etc."));␊ |
217 | }␊ |
218 | ␊ |
219 | use File::Temp;␊ |
220 | ␊ |
221 | my %debug=('tag' => 0,␊ |
222 | ␉ 'generic' => 0,␊ |
223 | ␉ 'entities' => 0,␊ |
224 | ␉ 'refs' => 0,␊ |
225 | ␉ 'nsgmls' => 0);␊ |
226 | ␊ |
227 | my $xmlprolog = undef; # the '<?xml ... ?>' line if existing␊ |
228 | ␊ |
229 | sub initialize {␊ |
230 | my $self = shift;␊ |
231 | my %options = @_;␊ |
232 | ␊ |
233 | $self->{options}{'translate'}='';␊ |
234 | $self->{options}{'section'}='';␊ |
235 | $self->{options}{'indent'}='';␊ |
236 | $self->{options}{'empty'}='';␊ |
237 | $self->{options}{'verbatim'}='';␊ |
238 | $self->{options}{'ignore'}='';␊ |
239 | $self->{options}{'ignore-inclusion'}='';␊ |
240 | ␊ |
241 | $self->{options}{'include-all'}='';␊ |
242 | ␊ |
243 | $self->{options}{'force'}='';␊ |
244 | ␊ |
245 | $self->{options}{'verbose'}='';␊ |
246 | $self->{options}{'debug'}='';␊ |
247 | ␊ |
248 | foreach my $opt (keys %options) {␊ |
249 | ␉if ($options{$opt}) {␊ |
250 | ␉ die wrap_mod("po4a::sgml", dgettext ("po4a", "Unknown option: %s"), $opt) unless exists $self->{options}{$opt};␊ |
251 | ␉ $self->{options}{$opt} = $options{$opt};␊ |
252 | ␉}␊ |
253 | }␊ |
254 | if ($options{'debug'}) {␊ |
255 | ␉foreach (split /\s+/, $options{'debug'}) {␊ |
256 | ␉ $debug{$_} = 1;␊ |
257 | ␉}␊ |
258 | }␊ |
259 | }␊ |
260 | ␊ |
261 | sub read {␊ |
262 | my ($self,$filename)=@_;␊ |
263 | ␊ |
264 | push @{$self->{DOCPOD}{infile}}, $filename;␊ |
265 | $self->Locale::Po4a::TransTractor::read($filename);␊ |
266 | }␊ |
267 | ␊ |
268 | sub parse {␊ |
269 | my $self=shift;␊ |
270 | map {$self->parse_file($_)} @{$self->{DOCPOD}{infile}};␊ |
271 | }␊ |
272 | ␊ |
273 | #␊ |
274 | # Filter out some uninteresting strings for translation␊ |
275 | #␊ |
276 | sub translate {␊ |
277 | my ($self)=(shift);␊ |
278 | my ($string,$ref,$type)=(shift,shift,shift);␊ |
279 | my (%options)=@_;␊ |
280 | ␊ |
281 | # don't translate entries composed of one entity␊ |
282 | if ( (($string =~ /^&[^;]*;$/) || ($options{'wrap'} && $string =~ /^\s*&[^;]*;\s*$/))␊ |
283 | ␉ && !($self->{options}{'include-all'}) ){␊ |
284 | ␉warn wrap_mod("po4a::sgml", dgettext("po4a", "msgid skipped to help translators (contains only an entity)"), $string)␊ |
285 | ␉ unless $self->verbose() <= 0;␊ |
286 | ␉return $string.($options{'wrap'}?"\n":"");␊ |
287 | }␊ |
288 | # don't translate entries composed of tags only␊ |
289 | if ( $string =~ /^(((<[^>]*>)|\s)*)$/␊ |
290 | ␉ && !($self->{options}{'include-all'}) ) {␊ |
291 | ␉warn wrap_mod("po4a::sgml", dgettext("po4a", "msgid skipped to help translators (contains only tags)"), $string)␊ |
292 | ␉ unless $self->verbose() <= 0;␊ |
293 | ␉return $string.($options{'wrap'}?"\n":"");␊ |
294 | }␊ |
295 | ␊ |
296 | # don't translate entries composed of marked section tags only␊ |
297 | if ( ($string =~ /^(?:<!\s*\[\s*[^\[]+\s*\[|\]\s*]\s*>|\s)*$/)␊ |
298 | && !($self->{options}{'include-all'})) {␊ |
299 | warn wrap_mod("po4a::sgml", dgettext("po4a", "msgid skipped to ".␊ |
300 | "help translators (contains only opening or closing ".␊ |
301 | "tags of marked sections)"), $string)␊ |
302 | unless $self->verbose() <= 0;␊ |
303 | return $string.($options{'wrap'}?"\n":"");␊ |
304 | }␊ |
305 | ␊ |
306 | $string = $self->SUPER::translate($string,$ref,$type,%options);␊ |
307 | ␊ |
308 | $string = $self->post_trans($string,$ref,$type);␊ |
309 | ␊ |
310 | return $string;␊ |
311 | }␊ |
312 | ␊ |
313 | sub post_trans {␊ |
314 | my ($self,$str,$ref,$type)=@_;␊ |
315 | ␊ |
316 | # Change ascii non-breaking space to an ␊ |
317 | my $nbs_out = "\xA0";␊ |
318 | my $enc_length = Encode::from_to($nbs_out, "latin1",␊ |
319 | $self->get_out_charset);␊ |
320 | $str =~ s/\Q$nbs_out/ /g if defined $enc_length;␊ |
321 | ␊ |
322 | return $str;␊ |
323 | }␊ |
324 | ␊ |
325 | #␊ |
326 | # Make sure our cruft is removed from the file␊ |
327 | #␊ |
328 | sub pushline {␊ |
329 | my ($self,$line)=@_;␊ |
330 | $line =~ s/{PO4A-amp}/&/g;␊ |
331 | $self->SUPER::pushline($line);␊ |
332 | }␊ |
333 | ␊ |
334 | sub set_tags_kind {␊ |
335 | my $self=shift;␊ |
336 | my (%kinds)=@_;␊ |
337 | ␊ |
338 | foreach (qw(translate empty section verbatim ignore attributes qualify)) {␊ |
339 | ␉$self->{SGML}->{k}{$_} = $self->{options}{$_} ? $self->{options}{$_}.' ' : '';␊ |
340 | # Remove the default behavior for the tags defined with the␊ |
341 | # options.␊ |
342 | foreach my $k (keys %kinds) {␊ |
343 | foreach my $t (split(" ", $self->{SGML}->{k}{$_})) {␊ |
344 | $kinds{$k} =~ s/\b$t\b//;␊ |
345 | }␊ |
346 | }␊ |
347 | }␊ |
348 | ␊ |
349 | foreach (keys %kinds) {␊ |
350 | ␉die "po4a::sgml: internal error: set_tags_kind called with unrecognized arg $_"␊ |
351 | ␉ if ($_ !~ /^(translate|empty|verbatim|ignore|indent|attributes|qualify)$/);␊ |
352 | ␊ |
353 | ␉$self->{SGML}->{k}{$_} .= $kinds{$_};␊ |
354 | }␊ |
355 | }␊ |
356 | ␊ |
357 | #␊ |
358 | # Do the actual work, using the SGMLS package and settings done elsewhere.␊ |
359 | #␊ |
360 | sub parse_file {␊ |
361 | my ($self,$mastername)=@_;␊ |
362 | my ($prolog);␊ |
363 | ␊ |
364 | # Rewrite the file to:␊ |
365 | # - protect optional inclusion marker (i.e. "<![ %str [" and "]]>")␊ |
366 | # - protect entities from expansion (ie "&release;")␊ |
367 | my $origfile="";␊ |
368 | my $i=0;␊ |
369 | while (defined(@{$self->{TT}{doc_in}}) && $i < @{$self->{TT}{doc_in}}) {␊ |
370 | $origfile .= ${$self->{TT}{doc_in}}[$i];␊ |
371 | $i+=2;␊ |
372 | }␊ |
373 | ␊ |
374 | unless ($self->{options}{'force'}) {␊ |
375 | # Detect if we can find the DTD␊ |
376 | my ($tmpfh,$tmpfile)=File::Temp::tempfile("po4a-XXXX",␊ |
377 | SUFFIX => ".sgml",␊ |
378 | DIR => "/tmp",␊ |
379 | UNLINK => 0);␊ |
380 | print $tmpfh $origfile;␊ |
381 | close $tmpfh␊ |
382 | or die wrap_mod("po4a::sgml",␊ |
383 | dgettext("po4a", "Can't close tempfile: %s"), $!);␊ |
384 | if (system("nsgmls -p < $tmpfile")) {␊ |
385 | unlink ($tmpfile);␊ |
386 | die wrap_mod("po4a::sgml",␊ |
387 | dgettext("po4a", "Error while running nsgmls -p. ".␊ |
388 | "Please check if nsgmls and the ".␊ |
389 | "DTD are installed."));␊ |
390 | }␊ |
391 | unlink ($tmpfile);␊ |
392 | }␊ |
393 | # Detect the XML pre-prolog␊ |
394 | if ($origfile =~ s/^(\s*<\?xml[^?]*\?>)//) {␊ |
395 | ␉warn wrap_mod("po4a::sgml", dgettext("po4a",␊ |
396 | ␉␉"Trying to handle a XML document as a SGML one. ".␊ |
397 | ␉␉"Feel lucky if it works, help us implementing a proper XML backend if it does not."), $mastername)␊ |
398 | ␉ unless $self->verbose() <= 0;␊ |
399 | ␉$xmlprolog=$1;␊ |
400 | }␊ |
401 | # Get the prolog␊ |
402 | {␊ |
403 | ␉$prolog=$origfile;␊ |
404 | ␉my $lvl; # number of '<' seen without matching '>'␊ |
405 | ␉my $pos = 0; # where in the document (in chars) while detecting prolog boundaries␊ |
406 | ␊ |
407 | ␉unless ($prolog =~ s/^(.*<!DOCTYPE).*$/$1/is) {␊ |
408 | ␉ die wrap_mod("po4a::sgml", dgettext("po4a",␊ |
409 | ␉␉"This file is not a master SGML document (no DOCTYPE). ".␊ |
410 | ␉␉"It may be a file to be included by another one, in which case it should not be passed to po4a directly. Text from included files is extracted/translated when handling the master file including them."));␊ |
411 | ␉}␊ |
412 | ␉$pos += length($prolog);␊ |
413 | ␉$lvl=1;␊ |
414 | ␉while ($lvl != 0) {␊ |
415 | ␉ # Eat comments in the prolog, since there may be some '>' or '<' in them.␊ |
416 | ␉ if ($origfile =~ m/^.{$pos}?(<!--.*?-->)/s) {␊ |
417 | ␉␉print "Found a comment in the prolog: $1\n" if ($debug{'generic'});␊ |
418 | ␉␉$pos += length($1);␊ |
419 | ␉␉# take care of the line numbers␊ |
420 | ␉␉my @a = split(/\n/,$1);␊ |
421 | ␉␉shift @a; # nb line - 1␊ |
422 | ␉␉while (defined(shift @a)) {␊ |
423 | ␉␉ $prolog .= "\n";␊ |
424 | ␉␉}␊ |
425 | ␉␉next;␊ |
426 | ␉ }␊ |
427 | ␉ # Search the closing '>'␊ |
428 | ␉ my ($c)=substr($origfile,$pos,1);␊ |
429 | ␉ $lvl++ if ($c eq '<');␊ |
430 | ␉ $lvl-- if ($c eq '>');␊ |
431 | ␉ $prolog = "$prolog$c";␊ |
432 | ␉ $pos++;␊ |
433 | ␉}␊ |
434 | }␊ |
435 | ␊ |
436 | # Add the definition of new tags that will be used for the␊ |
437 | # conditionnal inclusions␊ |
438 | if ($origfile =~ /^.*<!DOCTYPE[^[>]*\[/is) {␊ |
439 | $origfile =~ s/^(.*<!DOCTYPE[^[>]*\[)/$1 <!ELEMENT PO4ABEG - o empty> <!ATTLIST PO4ABEG name CDATA #REQUIRED> <!ELEMENT PO4AEND - o empty>/is;␊ |
440 | }␊ |
441 | ␊ |
442 | print STDERR "PROLOG=$prolog\n------------\n" if ($debug{'generic'});␊ |
443 | ␊ |
444 | # Configure the tags for this dtd␊ |
445 | if ($prolog =~ /debiandoc/i) {␊ |
446 | ␉$self->set_tags_kind("translate" => "author version abstract title".␊ |
447 | ␉␉␉ "date copyrightsummary heading p ".␊ |
448 | ␉␉␉ "example tag title",␊ |
449 | ␉␉␉ "empty" => "date ref manref url toc",␊ |
450 | ␉␉␉ "verbatim" => "example",␊ |
451 | ␉␉␉ "ignore" => "package prgn file tt em var ".␊ |
452 | ␉␉␉␉␉ "name email footnote po4aend po4abeg ".␊ |
453 | ␉␉␉ "strong ftpsite ftppath qref",␊ |
454 | ␉␉␉ "indent" => "appendix ".␊ |
455 | ␉ "book ".␊ |
456 | ␉ "chapt copyright ".␊ |
457 | ␉␉␉ "debiandoc ".␊ |
458 | ␉␉␉ "enumlist ".␊ |
459 | ␉␉␉ "item ".␊ |
460 | ␉␉␉ "list ".␊ |
461 | ␉ "sect sect1 sect2 sect3 sect4 ".␊ |
462 | ␉␉␉ "tag taglist titlepag toc");␊ |
463 | ␊ |
464 | } elsif ($prolog =~ /docbook/i) {␊ |
465 | ␉$self->set_tags_kind("translate" => "abbrev appendixinfo artheader attribution ".␊ |
466 | ␉ "biblioentry biblioset ".␊ |
467 | ␉ "chapterinfo collab collabname confdates confgroup conftitle ".␊ |
468 | ␉ "date ".␊ |
469 | ␉ "edition editor entry example ".␊ |
470 | ␉ "figure ".␊ |
471 | ␉ "glosssee glossseealso glossterm ".␊ |
472 | ␉ "holder ".␊ |
473 | ␉ "member msgaud msglevel msgorig ".␊ |
474 | ␉ "orgdiv orgname othername ".␊ |
475 | ␉ "pagenums para phrase pubdate publishername primary ".␊ |
476 | ␉ "refclass refdescriptor refentrytitle refmiscinfo refname refpurpose releaseinfo remark revnumber revremark ".␊ |
477 | ␉ "screeninfo seg secondary see seealso segtitle simpara substeps subtitle synopfragmentref synopsis ".␊ |
478 | ␉ "term tertiary title titleabbrev ".␊ |
479 | ␉ "contrib epigraph",␊ |
480 | ␉ "empty" => "audiodata colspec graphic imagedata textdata sbr spanspec videodata xref",␊ |
481 | ␉ "indent" => "abstract answer appendix article articleinfo audioobject author authorgroup ".␊ |
482 | ␉ "bibliodiv bibliography blockquote blockinfo book bookinfo bridgehead ".␊ |
483 | ␉ "callout calloutlist caption caution chapter copyright ".␊ |
484 | ␉ "dedication docinfo ".␊ |
485 | ␉ "entry ".␊ |
486 | ␉ "formalpara ".␊ |
487 | ␉ "glossary glossdef glossdiv glossentry glosslist group ".␊ |
488 | ␉ "imageobject important index indexterm informaltable itemizedlist ".␊ |
489 | ␉ "keyword keywordset ".␊ |
490 | ␉ "legalnotice listitem lot ".␊ |
491 | ␉ "mediaobject msg msgentry msginfo msgexplan msgmain msgrel msgsub msgtext ".␊ |
492 | ␉ "note ".␊ |
493 | ␉ "objectinfo orderedlist ".␊ |
494 | ␉ "part partintro preface procedure publisher ".␊ |
495 | ␉ "qandadiv qandaentry qandaset question ".␊ |
496 | ␉ "reference refentry refentryinfo refmeta refnamediv refsect1 refsect1info refsect2 refsect2info refsect3 refsect3info refsection refsectioninfo refsynopsisdiv refsynopsisdivinfo revision revdescription row ".␊ |
497 | ␉ "screenshot sect1 sect1info sect2 sect2info sect3 sect3info sect4 sect4info sect5 sect5info section sectioninfo seglistitem segmentedlist set setindex setinfo shortcut simplelist simplemsgentry simplesect step synopfragment ".␊ |
498 | ␉ "table tbody textobject tgroup thead tip toc ".␊ |
499 | ␉ "variablelist varlistentry videoobject ".␊ |
500 | ␉ "warning",␊ |
501 | ␉ "verbatim" => "address cmdsynopsis holder literallayout programlisting ".␊ |
502 | ␉ "refentrytitle refname refpurpose screen term title",␊ |
503 | ␉ "ignore" => "acronym action affiliation anchor application arg author authorinitials ".␊ |
504 | ␉ "city citation citerefentry citetitle classname co command computeroutput constant corpauthor country ".␊ |
505 | ␉ "database po4abeg po4aend ".␊ |
506 | ␉ "email emphasis envar errorcode errorname errortext errortype exceptionname ".␊ |
507 | ␉ "filename firstname firstterm footnote footnoteref foreignphrase function ".␊ |
508 | ␉ "glossterm guibutton guiicon guilabel guimenu guimenuitem guisubmenu ".␊ |
509 | ␉ "hardware ".␊ |
510 | ␉ "indexterm informalexample inlineequation inlinegraphic inlinemediaobject interface interfacename isbn ".␊ |
511 | ␉ "keycap keycode keycombo keysym ".␊ |
512 | ␉ "link lineannotation literal ".␊ |
513 | ␉ "manvolnum markup medialabel menuchoice methodname modespec mousebutton ".␊ |
514 | ␉ "nonterminal ".␊ |
515 | ␉ "olink ooclass ooexception oointerface option optional othercredit ".␊ |
516 | ␉ "parameter personname phrase productname productnumber prompt property pubsnumber ".␊ |
517 | ␉ "quote ".␊ |
518 | ␉ "remark replaceable returnvalue revhistory ".␊ |
519 | ␉ "sgmltag sidebar structfield structname subscript superscript surname symbol systemitem ".␊ |
520 | ␉ "token trademark type ".␊ |
521 | ␉ "ulink userinput ".␊ |
522 | ␉ "varname volumenum ".␊ |
523 | ␉ "wordasword ".␊ |
524 | ␉ "xref ".␊ |
525 | ␉ "year",␊ |
526 | ␉ "attributes" =>"<(article|book)>lang");␊ |
527 | ␊ |
528 | } else {␊ |
529 | ␉if ($self->{options}{'force'}) {␊ |
530 | ␉ warn wrap_mod("po4a::sgml", dgettext("po4a", "DTD of this file is unknown, but proceeding as requested."));␊ |
531 | ␉ $self->set_tags_kind();␊ |
532 | ␉} else {␊ |
533 | ␉ die wrap_mod("po4a::sgml", dgettext("po4a",␊ |
534 | ␉␉"DTD of this file is unknown. (supported: DebianDoc, DocBook). The prolog follows:")."\n$prolog");␊ |
535 | ␉}␊ |
536 | }␊ |
537 | ␊ |
538 | # Hash of the file entities that won't be included␊ |
539 | my %ignored_inclusion = ();␊ |
540 | foreach (split / /,$self->{options}{'ignore-inclusion'}) {␊ |
541 | $ignored_inclusion{$_} = 1;␊ |
542 | }␊ |
543 | ␊ |
544 | # Prepare the reference indirection stuff␊ |
545 | my @refs;␊ |
546 | my $length = ($origfile =~ tr/\n/\n/);␊ |
547 | print "XX Prepare reference indirection stuff\n" if $debug{'refs'};␊ |
548 | for (my $i=1; $i<=$length; $i++) {␊ |
549 | ␉push @refs,"$mastername:$i";␊ |
550 | ␉print "$mastername:$i\n" if $debug{'refs'};␊ |
551 | }␊ |
552 | ␊ |
553 | # protect the conditional inclusions in the file␊ |
554 | $origfile =~ s/<!\[\s*IGNORE\s*\[/{PO4A-beg-IGNORE}/g; # cond. incl. starts␊ |
555 | $origfile =~ s/<!\[\s*CDATA\s*\[/{PO4A-beg-CDATA}/g; # cond. incl. starts␊ |
556 | $origfile =~ s/<!\[\s*RCDATA\s*\[/{PO4A-beg-RCDATA}/g; # cond. incl. starts␊ |
557 | $origfile =~ s/<!\[\s*([^\[\s]+)\s*\[/<po4abeg name="$1">/g; # cond. incl. starts␊ |
558 | $origfile =~ s/\]\]>/<po4aend>/g; # cond. incl. end␊ |
559 | ␊ |
560 | # Remove <![ IGNORE [ sections␊ |
561 | # FIXME: we don't support included PO4A-beg-␊ |
562 | my $tmp1 = $origfile;␊ |
563 | while ($tmp1 =~ m/^(.*?)({PO4A-beg-\s*IGNORE\s*}(?:.+?)<po4aend>)(.*)$/s)␊ |
564 | {␊ |
565 | my ($begin,$ignored,$end) = ($1, $2, $3);␊ |
566 | my @begin = split(/\n/, $begin);␊ |
567 | my @ignored = split(/\n/, $ignored);␊ |
568 | my $pre = scalar @begin;␊ |
569 | my $len = (scalar @ignored) -1;␊ |
570 | $pre++ if ($begin =~ /\n$/s);␊ |
571 | $len++ if ($end =~ /^\n/s);␊ |
572 | # remove the references of the ignored lines␊ |
573 | splice @refs, $pre+1, $len-1;␊ |
574 | # remove the lines␊ |
575 | $tmp1 = $begin.$end;␊ |
576 | }␊ |
577 | $origfile = $tmp1;␊ |
578 | # The <, >, and & in a CDATA must be escaped because they do not␊ |
579 | # correspond to tags or entities delimiters.␊ |
580 | $tmp1 = $origfile;␊ |
581 | $origfile = "";␊ |
582 | while ($tmp1 =~ m/^(.*?{PO4A-beg-\s*(?:CDATA|RCDATA)\s*})(.+?)(<po4aend>.*)$/s) {␊ |
583 | my ($begin, $tmp) = ($1, $2);␊ |
584 | $tmp1 = $3;␊ |
585 | $tmp =~ s/</{PO4A-lt}/gs;␊ |
586 | $tmp =~ s/>/{PO4A-gt}/gs;␊ |
587 | $tmp =~ s/&/{PO4A-amp}/gs;␊ |
588 | $origfile .= $begin.$tmp;␊ |
589 | }␊ |
590 | $origfile .= $tmp1;␊ |
591 | ␊ |
592 | # Deal with the %entities; in the prolog. God damn it, this code is gross!␊ |
593 | # Try hard not to change the number of lines to not fuck up the references␊ |
594 | my %prologentincl;␊ |
595 | my $moretodo=1;␊ |
596 | PROLOGENTITY: while ($moretodo) { # non trivial loop to deal with recursive inclusion␊ |
597 | ␉$moretodo = 0;␊ |
598 | ␉# Unprotect not yet defined inclusions␊ |
599 | ␉$prolog =~ s/{PO4A-percent}/%/sg;␊ |
600 | print STDERR "prolog=>>>>$prolog<<<<\n"␊ |
601 | ␉ if ($debug{'entities'});␊ |
602 | ␉while ($prolog =~ /(.*?)<!ENTITY\s*%\s*(\S*)\s+SYSTEM\s*"([^>"]*)"\s*>(.*)$/is) { #})"{ (Stupid editor)␊ |
603 | ␉ print STDERR "Seen the definition entity of prolog inclusion '$2' (=$3)\n"␊ |
604 | ␉ if ($debug{'entities'});␊ |
605 | ␉ # Preload the content of the entity.␊ |
606 | ␉ my $key = $2;␊ |
607 | ␉ my $filename=$3;␊ |
608 | ␉ my $origfilename = $filename;␊ |
609 | ␉ my ($begin, $end) = ($1, $4);␊ |
610 | ␉ if ($filename !~ m%^/% && $mastername =~ m%/%) {␊ |
611 | ␉ my $dir=$mastername;␊ |
612 | ␉ $dir =~ s%/[^/]*$%%;␊ |
613 | ␉ $filename="$dir/$filename";␊ |
614 | ␉ # origfile also needs to be fixed otherwise nsgmls won't␊ |
615 | ␉ # find the file.␊ |
616 | ␉ $origfile =~ s/(<!ENTITY\s*%\s*\Q$key\E\s+SYSTEM\s*")\Q$origfilename\E("\s*>)/$1$filename$2/gsi;␊ |
617 | ␉ }␊ |
618 | ␉ if (defined $ignored_inclusion{$key} or !-e $filename) {␊ |
619 | ␉␉# We won't expand this entity.␊ |
620 | ␉␉# And we avoid nsgmls to do so.␊ |
621 | ␉␉$prolog = "$begin<!--{PO4A-ent-beg-$key}$filename".␊ |
622 | ␉␉ "{PO4A-ent-end}-->$end";␊ |
623 | ␉ } else {␊ |
624 | ␉ $prolog = $begin.$end;␊ |
625 | ␉ (-e $filename && open IN,"<$filename") ||␊ |
626 | ␉ die wrap_mod("po4a::sgml",␊ |
627 | ␉ dgettext("po4a",␊ |
628 | ␉ "Can't open %s (content of entity %s%s;): %s"),␊ |
629 | ␉ $filename, '%', $key, $!);␊ |
630 | ␉ local $/ = undef;␊ |
631 | ␉ $prologentincl{$key} = <IN>;␊ |
632 | ␉ close IN;␊ |
633 | ␉ print STDERR "Content of \%$key; is $filename (".␊ |
634 | ␉ ($prologentincl{$key} =~ tr/\n/\n/).␊ |
635 | ␉ " lines long)\n"␊ |
636 | ␉ if ($debug{'entities'});␊ |
637 | ␉ print STDERR "content: ".$prologentincl{$key}."\n"␊ |
638 | ␉ if ($debug{'entities'});␊ |
639 | ␉ $moretodo = 1;␊ |
640 | ␉ next PROLOGENTITY;␊ |
641 | ␉ }␊ |
642 | ␉}␊ |
643 | ␉while ($prolog =~ /(.*?)<!ENTITY\s*%\s*(\S*)\s*"([^>"]*)"\s*>(.*)$/is) { #})"{ (Stupid editor)␊ |
644 | ␉ print STDERR "Seen the definition entity of prolog definition '$2' (=$3)\n"␊ |
645 | ␉ if ($debug{'entities'});␊ |
646 | ␉ # Preload the content of the entity.␊ |
647 | ␉ my $key = $2;␊ |
648 | ␉ $prolog = $1.$4;␊ |
649 | ␉ $prologentincl{$key} = $3;␊ |
650 | ␉ print STDERR "content: ".$prologentincl{$key}."\n"␊ |
651 | ␉ if ($debug{'entities'});␊ |
652 | ␉ $moretodo = 1;␊ |
653 | ␉ next PROLOGENTITY;␊ |
654 | ␉}␊ |
655 | while ($prolog =~ /^(.*?)%([^;\s]*);(.*)$/s) {␊ |
656 | ␉ my ($pre,$ent,$post) = ($1,$2,$3);␊ |
657 | ␉ # Yeah, right, the content of the entity can be defined in a not yet loaded entity␊ |
658 | ␉ # It's easy to build a weird case where all that shit collapses poorly. But why the␊ |
659 | ␉ # hell are you using those strange constructs in your document, damn it?␊ |
660 | ␉ print STDERR "Seen prolog inclusion $ent\n" if ($debug{'entities'});␊ |
661 | ␉ if (defined ($prologentincl{$ent})) {␊ |
662 | ␉␉$prolog = $pre.$prologentincl{$ent}.$post;␊ |
663 | ␉␉print STDERR "Change \%$ent; to its content in the prolog\n"␊ |
664 | ␉␉ if $debug{'entities'};␊ |
665 | ␉␉$moretodo = 1;␊ |
666 | ␉ } else {␊ |
667 | ␉␉# AAAARGH stupid document using %bla; and having then defined in another inclusion!␊ |
668 | ␉␉# Protect it for this pass, and unprotect it on next one␊ |
669 | ␉␉print STDERR "entity $ent not defined yet ?!\n"␊ |
670 | ␉␉ if $debug{'entities'};␊ |
671 | ␉␉$prolog = "$pre".'{PO4A-percent}'."$ent;$post";␊ |
672 | ␉ }␊ |
673 | ␉}␊ |
674 | }␊ |
675 | $prolog =~ s/<!--{PO4A-ent-beg-(.*?)}(.*?){PO4A-ent-end}-->/<!ENTITY % $1 SYSTEM "$2">/g;␊ |
676 | # Unprotect undefined inclusions, and die of them␊ |
677 | $prolog =~ s/{PO4A-percent}/%/sg;␊ |
678 | if ($prolog =~ /%([^;\s]*);/) {␊ |
679 | die wrap_mod("po4a::sgml",␊ |
680 | dgettext("po4a",␊ |
681 | "unrecognized prolog inclusion entity: %%%s;"),␊ |
682 | $1)␊ |
683 | unless ($ignored_inclusion{$1});␊ |
684 | }␊ |
685 | # Protect &entities; (all but the ones asking for a file inclusion)␊ |
686 | # search the file inclusion entities␊ |
687 | my %entincl;␊ |
688 | my $searchprolog=$prolog;␊ |
689 | while ($searchprolog =~ /(.*?)<!ENTITY\s+(\S*)\s+SYSTEM\s*"([^>"]*)"\s*>(.*)$/is) { #})"{␊ |
690 | ␉print STDERR "Seen the entity of inclusion $2 (=$3)\n"␊ |
691 | ␉ if ($debug{'entities'});␊ |
692 | ␉my $key = $2;␊ |
693 | ␉my $filename = $3;␊ |
694 | ␉my $origfilename = $filename;␊ |
695 | ␉$searchprolog = $4;␊ |
696 | ␉if ($filename !~ m%^/% && $mastername =~ m%/%) {␊ |
697 | ␉ my $dir=$mastername;␊ |
698 | ␉ $dir =~ s%/[^/]*$%%;␊ |
699 | ␉ $filename="$dir/$filename";␊ |
700 | ␉ # origfile also needs to be fixed otherwise nsgmls won't find␊ |
701 | ␉ # the file.␊ |
702 | ␉ $origfile =~ s/(<!ENTITY\s+$key\s+SYSTEM\s*")\Q$origfilename\E("\s*>)/$1$filename$2/gsi;␊ |
703 | ␉}␊ |
704 | ␉if ((not defined $ignored_inclusion{$2}) and (-e $filename)) {␊ |
705 | ␉ $entincl{$key}{'filename'}=$filename;␊ |
706 | ␉ # Preload the content of the entity␊ |
707 | ␉ (-e $filename && open IN,"<$filename") ||␊ |
708 | ␉␉die wrap_mod("po4a::sgml",␊ |
709 | ␉␉ dgettext("po4a",␊ |
710 | ␉␉ "Can't open %s (content of entity %s%s;): %s"),␊ |
711 | ␉␉ $filename, '&', $key, $!);␊ |
712 | ␉ local $/ = undef;␊ |
713 | ␉ $entincl{$key}{'content'} = <IN>;␊ |
714 | ␉ close IN;␊ |
715 | ␉ $entincl{$key}{'length'} = ($entincl{$key}{'content'} =~ tr/\n/\n/);␊ |
716 | ␉ print STDERR "read $filename (content of \&$key;, $entincl{$key}{'length'} lines long)\n"␊ |
717 | ␉␉if ($debug{'entities'});␊ |
718 | ␉}␊ |
719 | }␊ |
720 | ␊ |
721 | # Change the entities including files in the document␊ |
722 | my $dosubstitution = 1;␊ |
723 | while ($dosubstitution) {␊ |
724 | $dosubstitution = 0;␊ |
725 | foreach my $key (keys %entincl) {␊ |
726 | # The external entity can be referenced as &key; or &key␊ |
727 | # In the second case, we must differentiate &key and &key2␊ |
728 | while ($origfile =~/^(.*?)&$key(;.*$|[^-_:.A-Za-z0-9].*$|$)/s) {␊ |
729 | # Since we will include a new file, we␊ |
730 | #Â must do a new round of substitutions.␊ |
731 | $dosubstitution = 1;␊ |
732 | my ($begin,$end)=($1,$2);␊ |
733 | $end = "" unless (defined $end);␊ |
734 | $end =~ s/^;//s;␊ |
735 | ␊ |
736 | if ($begin =~ m/.*<!--(.*?)$/s and $1 !~ m/-->/s) {␊ |
737 | # This entity is commented. Just remove it.␊ |
738 | $origfile = $begin.$end;␊ |
739 | next;␊ |
740 | }␊ |
741 | ␊ |
742 | # add the refs␊ |
743 | my $len = $entincl{$key}{'length'}; # number added by the inclusion␊ |
744 | my $pre = ($begin =~ tr/\n/\n/); # number of \n␊ |
745 | my $post = ($end =~ tr/\n/\n/);␊ |
746 | print "XX Add a ref. pre=$pre; len=$len; post=$post\n"␊ |
747 | if $debug{'refs'};␊ |
748 | # Keep a reference of inclusion position in main file␊ |
749 | my $main = $refs[$pre];␊ |
750 | ␊ |
751 | # Remove the references for the lines after the inclusion␊ |
752 | # point.␊ |
753 | my @endrefs = splice @refs, $pre+1;␊ |
754 | ␊ |
755 | # Add the references of the added lines␊ |
756 | my $i;␊ |
757 | for ($i=0; $i<$len; $i++) {␊ |
758 | $refs[$i+$pre] = "$main $entincl{$key}{'filename'}:".($i+1);␊ |
759 | }␊ |
760 | ␊ |
761 | if ($begin !~ m/\n[ \t]*$/s) {␊ |
762 | if ($entincl{$key}{'content'} =~ m/^[ \t]*\n/s) {␊ |
763 | # There is nothing in the first line of the␊ |
764 | # included file, and something on the line before␊ |
765 | # the inclusion The line reference will be more␊ |
766 | # informative like this:␊ |
767 | $refs[$pre] = $main;␊ |
768 | }␊ |
769 | }␊ |
770 | if ($end !~ s/^[ \t]*\n//s) {␊ |
771 | if ($entincl{$key}{'content'} =~ m/\n[ \t]*$/s) {␊ |
772 | # There is something on the line after the␊ |
773 | # inclusion, and there is an end of line at the␊ |
774 | # end of the included file. We must add the line␊ |
775 | # reference of the remainder on the line:␊ |
776 | push @refs, $main;␊ |
777 | }␊ |
778 | }␊ |
779 | # Append the references removed earlier (lines after the␊ |
780 | # inclusion point).␊ |
781 | push @refs, @endrefs;␊ |
782 | ␊ |
783 | # Do the substitution␊ |
784 | $origfile = "$begin".$entincl{$key}{'content'}."$end";␊ |
785 | print STDERR "substitute $key\n" if ($debug{'entities'});␊ |
786 | }␊ |
787 | }␊ |
788 | }␊ |
789 | $origfile=~s/\G(.*?)&([A-Za-z_:][-_:.A-Za-z0-9]*|#[0-9]+|#x[0-9a-fA-F]+)\b/$1\{PO4A-amp\}$2/gs;␊ |
790 | if (defined($xmlprolog) && length($xmlprolog)) {␊ |
791 | $origfile=~s/\/>/\{PO4A-close\}>/gs;␊ |
792 | }␊ |
793 | ␊ |
794 | if ($debug{'refs'}) {␊ |
795 | ␉print "XX Resulting shifts\n";␊ |
796 | ␉for (my $i=0; $i<scalar @refs; $i++) {␊ |
797 | ␉ print "$mastername:".($i+1)." -> $refs[$i]\n";␊ |
798 | ␉}␊ |
799 | }␊ |
800 | ␊ |
801 | my ($tmpfh,$tmpfile)=File::Temp::tempfile("po4a-XXXX",␊ |
802 | SUFFIX => ".sgml",␊ |
803 | DIR => "/tmp",␊ |
804 | UNLINK => 0);␊ |
805 | print $tmpfh $origfile;␊ |
806 | close $tmpfh or die wrap_mod("po4a::sgml", dgettext("po4a", "Can't close tempfile: %s"), $!);␊ |
807 | ␊ |
808 | my $cmd="nsgmls -l -E 0 -wno-valid < $tmpfile".␊ |
809 | ($debug{'nsgmls'}?"":" 2>/dev/null")." |";␊ |
810 | print STDERR "CMD=$cmd\n" if ($debug{'generic'} or $debug{'nsgmls'});␊ |
811 | ␊ |
812 | open (IN,$cmd) || die wrap_mod("po4a::sgml", dgettext("po4a", "Can't run nsgmls: %s"), $!);␊ |
813 | ␊ |
814 | # The kind of tags␊ |
815 | my (%translate,%empty,%verbatim,%indent,%exist,%attribute,%qualify);␊ |
816 | foreach (split(/ /, ($self->{SGML}->{k}{'translate'}||'') )) {␊ |
817 | ␉$translate{uc $_} = 1;␊ |
818 | ␉$indent{uc $_} = 1;␊ |
819 | ␉$exist{uc $_} = 1;␊ |
820 | }␊ |
821 | foreach (split(/ /, ($self->{SGML}->{k}{'empty'}||'') )) {␊ |
822 | ␉$empty{uc $_} = 1;␊ |
823 | ␉$exist{uc $_} = 1;␊ |
824 | }␊ |
825 | foreach (split(/ /, ($self->{SGML}->{k}{'verbatim'}||'') )) {␊ |
826 | ␉$translate{uc $_} = 1;␊ |
827 | ␉$verbatim{uc $_} = 1;␊ |
828 | ␉$exist{uc $_} = 1;␊ |
829 | }␊ |
830 | foreach (split(/ /, ($self->{SGML}->{k}{'indent'}||'') )) {␊ |
831 | ␉$translate{uc $_} = 1;␊ |
832 | ␉$indent{uc $_} = 1;␊ |
833 | ␉$exist{uc $_} = 1;␊ |
834 | }␊ |
835 | foreach (split(/ /, ($self->{SGML}->{k}{'ignore'}) || '')) {␊ |
836 | ␉$exist{uc $_} = 1;␊ |
837 | }␊ |
838 | foreach (split(/ /, ($self->{SGML}->{k}{'attributes'} || ''))) {␊ |
839 | my ($attr, $tags);␊ |
840 | if (m/(^.*>)(\w+)/) {␊ |
841 | $attr=uc $2;␊ |
842 | $tags=$1;␊ |
843 | } else {␊ |
844 | $attr=uc $_;␊ |
845 | $tags=".*";␊ |
846 | }␊ |
847 | if (exists $attribute{$attr}) {␊ |
848 | $attribute{$attr}.="|$tags";␊ |
849 | } else {␊ |
850 | $attribute{$attr} = $tags;␊ |
851 | }␊ |
852 | }␊ |
853 | foreach (split(/ /, ($self->{SGML}->{k}{'qualify'}) || '')) {␊ |
854 | $qualify{uc $_} = 1;␊ |
855 | $attribute{uc $_} = '.*' unless exists $attribute{uc $_};␊ |
856 | }␊ |
857 | ␊ |
858 | ␊ |
859 | # What to do before parsing␊ |
860 | ␊ |
861 | # push the XML prolog if existing␊ |
862 | $self->pushline($xmlprolog."\n") if (defined($xmlprolog) && length($xmlprolog));␊ |
863 | ␊ |
864 | # Put the prolog into the file, allowing for entity definition translation␊ |
865 | # <!ENTITY myentity "definition_of_my_entity">␊ |
866 | # and push("<!ENTITY myentity \"".$self->translate("definition_of_my_entity")␊ |
867 | if ($prolog =~ m/(.*?\[)(.*)(\]>)/s) {␊ |
868 | ␉warn "Pre=~~$1~~;Post=~~$3~~\n" if ($debug{'entities'});␊ |
869 | ␉$self->pushline($1."\n") if (length($1));␊ |
870 | ␉$prolog=$2;␊ |
871 | ␉my ($post) = $3;␊ |
872 | ␉while ($prolog =~ m/^(.*?)<!ENTITY\s+(\S*)\s+"([^"]*)"\s*>(.*)$/is) { #" ){␊ |
873 | ␉ $self->pushline($1) if length($1);␊ |
874 | ␉ $self->pushline("<!ENTITY $2 \"".$self->translate($3,"","definition of entity \&$2;")."\">");␊ |
875 | ␉ warn "Seen text entity $2\n" if ($debug{'entities'});␊ |
876 | ␉ $prolog = $4;␊ |
877 | ␉}␊ |
878 | ␉$prolog .= $post;␊ |
879 | ␉$self->pushline($prolog."\n") if (length($prolog));␊ |
880 | } else {␊ |
881 | ␉warn "No entity declaration detected in ~~$prolog~~...\n" if ($debug{'entities'});␊ |
882 | ␉$self->pushline($prolog) if length($prolog);␊ |
883 | }␊ |
884 | ␊ |
885 | # The parse object.␊ |
886 | # Damn SGMLS. It makes me do crude things.␊ |
887 | no strict "subs";␊ |
888 | my $parse= new SGMLS(IN);␊ |
889 | use strict;␊ |
890 | ␊ |
891 | # Some values for the parsing␊ |
892 | my @open=(); # opened translation container tags␊ |
893 | my $verb=0; # can we wrap or not␊ |
894 | my $verb_last_ref;␊ |
895 | my $seenfootnote=0;␊ |
896 | my $indent=0; # indent level␊ |
897 | my $lastchar = ''; #␊ |
898 | my $buffer= ""; # what we will soon handle␊ |
899 | ␊ |
900 | # Keep a reference to the last line indicated by nsgmls␊ |
901 | my $line=0;␊ |
902 | # Unfortunately, nsgmls do not mention all the line changes. We have␊ |
903 | # to keep track of the number of lines seen in the "record ends".␊ |
904 | my $adds=0;␊ |
905 | # If the last line received contains only spaces, do not take it into␊ |
906 | # account for the line reference of the paragraph.␊ |
907 | my $empty_last_cdata=0;␊ |
908 | # run the appropriate handler for each event␊ |
909 | EVENT: while (my $event = $parse->next_event) {␊ |
910 | ␉# get the line reference to build po entries␊ |
911 | ␉if ($line != $parse->line) {␊ |
912 | ␉ # nsgmls informs us of that the line changed. Reset $adds and␊ |
913 | ␉ # $empty_last_cdata␊ |
914 | ␉ $adds = 0;␊ |
915 | ␉ $empty_last_cdata = 0;␊ |
916 | ␉ $line = $parse->line;␊ |
917 | ␉}␊ |
918 | ␉my $ref=$refs[$parse->line-1 + $adds - $empty_last_cdata];␊ |
919 | ␉# In verbatim mode, keep the current line reference.␊ |
920 | ␉if ($verb) {␊ |
921 | ␉ $ref=$refs[$parse->line-1];␊ |
922 | ␉}␊ |
923 | ␉my $type;␊ |
924 | ␊ |
925 | ␉if ($event->type eq 'start_element') {␊ |
926 | ␉ die wrap_ref_mod($ref, "po4a::sgml",␊ |
927 | ␉ dgettext("po4a", "Unknown tag %s"),␊ |
928 | ␉ $event->data->name)␊ |
929 | ␉␉unless $exist{$event->data->name};␊ |
930 | ␊ |
931 | ␉ $lastchar = ">";␊ |
932 | ␊ |
933 | ␉ # Which tag did we see?␊ |
934 | ␉ my $tag='';␊ |
935 | ␉ $tag .= '<'.lc($event->data->name());␊ |
936 | ␉ while (my ($attr, $val) = each %{$event->data->attributes()}) {␊ |
937 | ␉␉my $value = $val->value();␊ |
938 | #␉␉if ($val->type() eq 'IMPLIED') {␊ |
939 | #␉␉ $tag .= ' '.lc($attr).'="'.lc($attr).'"';␊ |
940 | #␉␉} els␊ |
941 | if ($val->type() eq 'CDATA' ||␊ |
942 | ␉␉ $val->type() eq 'IMPLIED') {␊ |
943 | ␉␉ if (defined $value && length($value)) {␊ |
944 | my $lattr=lc $attr;␊ |
945 | my $uattr=uc $attr;␊ |
946 | if (exists $attribute{$uattr}) {␊ |
947 | my $context="";␊ |
948 | foreach my $o (@open) {␊ |
949 | next if (!defined $o or $o =~ m%^</%);␊ |
950 | $o =~ s/ .*/>/;␊ |
951 | $context.=$o;␊ |
952 | }␊ |
953 | $context=join("", $context,␊ |
954 | "<", lc($event->data->name()), ">");␊ |
955 | if ($context =~ /^($attribute{$uattr})$/) {␊ |
956 | if ($qualify{$uattr}) {␊ |
957 | my $translated = $self->translate("$lattr=$value", $ref, "attribute $context$lattr");␊ |
958 | if ($translated =~ s/^$lattr=//) {␊ |
959 | $value=$translated;␊ |
960 | } else {␊ |
961 | die wrap_mod("po4a::sgml", dgettext("po4a", "bad translation '%s' for '%s' in '%s'"), $translated, $context.$lattr, $ref);␊ |
962 | }␊ |
963 | } else {␊ |
964 | $value = $self->translate($value, $ref, "attribute $context$lattr");␊ |
965 | }␊ |
966 | }␊ |
967 | }␊ |
968 | ␉␉␉if ($value =~ m/\"/) {␊ |
969 | ␉␉␉ $value = "'".$value."'";␊ |
970 | ␉␉␉} else {␊ |
971 | ␉␉␉ $value = '"'.$value.'"';␊ |
972 | ␉␉␉}␊ |
973 | ␉␉␉$tag .= " $lattr=$value";␊ |
974 | ␉␉ }␊ |
975 | ␉␉} elsif ($val->type() eq 'NOTATION') {␊ |
976 | ␉␉} else {␊ |
977 | ␉␉ $tag .= ' '.lc($attr).'="'.lc($value).'"'␊ |
978 | ␉␉␉if (defined $value && length($value));␊ |
979 | ␉␉}␊ |
980 | ␉ }␊ |
981 | ␉ $tag .= '>';␊ |
982 | ␊ |
983 | ␊ |
984 | ␉ # debug␊ |
985 | ␉ print STDERR "Seen $tag, open level=".(scalar @open).", verb=$verb\n"␊ |
986 | ␉␉if ($debug{'tag'});␊ |
987 | ␊ |
988 | ␉ if ($event->data->name() eq 'FOOTNOTE') {␊ |
989 | ␉␉# we want to put the <para> inside the <footnote> in the same msgid␊ |
990 | ␉␉$seenfootnote = 1;␊ |
991 | ␉ }␊ |
992 | ␊ |
993 | ␉ if ($seenfootnote) {␊ |
994 | ␉␉$buffer .= $tag;␊ |
995 | ␉␉next EVENT;␊ |
996 | ␉ }␊ |
997 | ␉ if ($translate{$event->data->name()}) {␊ |
998 | ␉␉# Build the type␊ |
999 | ␉␉if (scalar @open > 0) {␊ |
1000 | ␉␉ $type=$open[$#open] . $tag;␊ |
1001 | ␉␉} else {␊ |
1002 | ␉␉ $type=$tag;␊ |
1003 | ␉␉}␊ |
1004 | ␊ |
1005 | ␉␉# do the job␊ |
1006 | ␉␉if (@open > 0) {␊ |
1007 | ␉␉ $self->end_paragraph($buffer,$ref,$type,$verb,$indent,␊ |
1008 | ␉␉␉␉␉ @open);␊ |
1009 | ␉␉} else {␊ |
1010 | ␉␉ $self->pushline($buffer) if $buffer;␊ |
1011 | ␉␉}␊ |
1012 | ␉␉$buffer="";␊ |
1013 | ␉␉push @open,$tag;␊ |
1014 | ␉ } elsif ($indent{$event->data->name()}) {␊ |
1015 | ␉␉die wrap_ref_mod($ref, "po4a::sgml", dgettext("po4a",␊ |
1016 | ␉␉ "Closing tag for a translation container missing before %s"),$tag)␊ |
1017 | ␉␉ if (scalar @open);␊ |
1018 | ␉ }␊ |
1019 | ␊ |
1020 | ␉ if ($verbatim{$event->data->name()}) {␊ |
1021 | ␉␉$verb++;␊ |
1022 | ␉␉# Keep a reference to the line that openned the verbatim␊ |
1023 | ␉␉# section. This is needed to check if its data starts on␊ |
1024 | ␉␉# the same line.␊ |
1025 | ␉␉$verb_last_ref = $ref;␊ |
1026 | ␉ }␊ |
1027 | ␉ if ($verb) {␊ |
1028 | ␉␉# Tag in a verbatim section. Check if it appeared at␊ |
1029 | ␉␉# the same line than the previous data. If not, it␊ |
1030 | ␉␉# means that an end of line must be added to the␊ |
1031 | ␉␉# buffer.␊ |
1032 | ␉␉if ($ref ne $verb_last_ref) {␊ |
1033 | ␉␉ # FIXME: Does it work if $verb > 1␊ |
1034 | ␉␉ $buffer .= "\n";␊ |
1035 | ␉␉ $verb_last_ref = $ref;␊ |
1036 | ␉␉}␊ |
1037 | ␉ }␊ |
1038 | ␊ |
1039 | ␉ if ($indent{$event->data->name()}) {␊ |
1040 | ␉␉# push the indenting space only if not in verb before that tag␊ |
1041 | ␉␉# push trailing "\n" only if not in verbose afterward␊ |
1042 | ␉␉$self->pushline( ($verb>1?"": (" " x $indent)).$tag.($verb?"":"\n"));␊ |
1043 | ␉␉$indent ++ unless $empty{$event->data->name()} ;␊ |
1044 | ␉ } else {␊ |
1045 | ␉␉$tag =~ s/<po4abeg name="([^"]+)">/<![ $1 [/; #"; Stupid emacs␊ |
1046 | ␉␉$tag =~ s/<po4aend>/]]>/;␊ |
1047 | ␉␉$buffer .= $tag;␊ |
1048 | ␉ }␊ |
1049 | ␉} # end of type eq 'start_element'␊ |
1050 | ␊ |
1051 | ␉elsif ($event->type eq 'end_element') {␊ |
1052 | ␉ my $tag = ($empty{$event->data->name()}␊ |
1053 | ␉␉ ?␊ |
1054 | ␉␉ ''␊ |
1055 | ␉␉ :␊ |
1056 | ␉␉ '</'.lc($event->data->name()).'>');␊ |
1057 | ␊ |
1058 | ␉ if ($verb) {␊ |
1059 | ␉␉# Tag in a verbatim section. Check if it appeared at␊ |
1060 | ␉␉# the same line than the previous data. If not, it␊ |
1061 | ␉␉# means that an end of line must be added to the␊ |
1062 | ␉␉# buffer.␊ |
1063 | ␉␉if ($ref ne $verb_last_ref) {␊ |
1064 | ␉␉ # FIXME: Does it work if $verb > 1␊ |
1065 | ␉␉ $buffer .= "\n";␊ |
1066 | ␉␉ $verb_last_ref = $ref;␊ |
1067 | ␉␉}␊ |
1068 | ␉ }␊ |
1069 | ␉ print STDERR "Seen $tag, level=".(scalar @open).", verb=$verb\n"␊ |
1070 | ␉␉if ($debug{'tag'});␊ |
1071 | ␊ |
1072 | ␉ $lastchar = ">";␊ |
1073 | ␊ |
1074 | ␉ if ($event->data->name() eq 'FOOTNOTE') {␊ |
1075 | ␉␉# we want to put the <para> inside the <footnote> in the same msgid␊ |
1076 | ␉␉$seenfootnote = 0;␊ |
1077 | ␉ }␊ |
1078 | ␊ |
1079 | ␉ if ($seenfootnote) {␊ |
1080 | ␉␉$buffer .= $tag;␊ |
1081 | ␉␉next EVENT;␊ |
1082 | ␉ }␊ |
1083 | ␉ if ($translate{$event->data->name()}) {␊ |
1084 | ␉␉$type = $open[$#open] . $tag;␊ |
1085 | ␉␉$self->end_paragraph($buffer,$ref,$type,$verb,$indent,@open);␊ |
1086 | ␉␉$buffer = "";␊ |
1087 | ␉␉pop @open;␊ |
1088 | ␉␉if (@open > 0) {␊ |
1089 | ␉␉ pop @open;␊ |
1090 | ␉␉ push @open,$tag;␊ |
1091 | ␉␉}␊ |
1092 | ␉ } elsif ($indent{$event->data->name()}) {␊ |
1093 | ␉␉die wrap_ref_mod($ref, "po4a::sgml", dgettext("po4a",␊ |
1094 | "Closing tag for a translation container missing before %s"), $tag)␊ |
1095 | ␉␉ if (scalar @open);␊ |
1096 | ␉ }␊ |
1097 | ␊ |
1098 | ␉ unless ($event->data->name() =~ m/^(PO4ABEG|PO4AEND)$/si) {␊ |
1099 | ␉␉if ($indent{$event->data->name()}) {␊ |
1100 | ␉␉ $indent -- ;␊ |
1101 | ␉␉ # add indenting space only when not in verbatim␊ |
1102 | ␉␉ # add the tailing \n only if out of verbatim after that tag␊ |
1103 | ␉␉ $self->pushline(($verb?"":(" " x $indent)).$tag.($verb>1?"":"\n"));␊ |
1104 | ␉␉} else {␊ |
1105 | ␉␉ $buffer .= $tag;␊ |
1106 | ␉␉}␊ |
1107 | ␉␉$verb-- if $verbatim{$event->data->name()};␊ |
1108 | ␉ }␊ |
1109 | ␉} # end of type eq 'end_element'␊ |
1110 | ␊ |
1111 | ␉elsif ($event->type eq 'cdata') {␊ |
1112 | ␉ my $cdata = $event->data;␊ |
1113 | ␉ $empty_last_cdata=($cdata =~ m/^\s*$/);␊ |
1114 | ␉ $cdata =~ s/{PO4A-lt}/</g;␊ |
1115 | ␉ $cdata =~ s/{PO4A-gt}/>/g;␊ |
1116 | ␉ $cdata =~ s/{PO4A-amp}/&/g;␊ |
1117 | $cdata =~ s/{PO4A-end}/\]\]>/g;␊ |
1118 | $cdata =~ s/{PO4A-beg-([^\}]+)}/<!\[$1\[/g;␊ |
1119 | ␉ if ($verb) {␊ |
1120 | ␉␉# Check if this line of data appear on the same line␊ |
1121 | ␉␉# than the previous tag. If not, append an end of line␊ |
1122 | ␉␉# to the buffer.␊ |
1123 | ␉␉if ($ref ne $verb_last_ref) {␊ |
1124 | ␉␉ $buffer .= "\n";␊ |
1125 | ␉␉ $verb_last_ref = $ref;␊ |
1126 | ␉␉}␊ |
1127 | ␉ } else {␊ |
1128 | ␉␉$cdata =~ s/\\t/ /g;␊ |
1129 | ␉␉$cdata =~ s/\s+/ /g;␊ |
1130 | ␉␉$cdata =~ s/^\s//s if $lastchar eq ' ';␊ |
1131 | ␉ }␊ |
1132 | ␉ $lastchar = substr($cdata, -1, 1);␊ |
1133 | ␉ $buffer .= $cdata;␊ |
1134 | ␉ if (defined($xmlprolog) && length($xmlprolog)) {␊ |
1135 | ␉␉$buffer =~ s/>PO4A-close\}>/\/>/sg;␊ |
1136 | ␉␉$buffer =~ s/PO4A-close\}>//sg; # This should not be necessary␊ |
1137 | ␉ }␊ |
1138 | ␉} # end of type eq 'cdata'␊ |
1139 | ␊ |
1140 | ␉elsif ($event->type eq 'sdata') {␊ |
1141 | ␉ my $sdata = $event->data;␊ |
1142 | ␉ $sdata =~ s/^\[//;␊ |
1143 | ␉ $sdata =~ s/\s*\]$//;␊ |
1144 | ␉ $lastchar = substr($sdata, -1, 1);␊ |
1145 | ␉ $buffer .= '&'.$sdata.';';␊ |
1146 | ␉} # end of type eq 'sdata'␊ |
1147 | ␊ |
1148 | ␉elsif ($event->type eq 're') {␊ |
1149 | ␉ # End of record, the line reference shall be incremented.␊ |
1150 | ␉ $adds +=1;␊ |
1151 | ␉ if ($verb) {␊ |
1152 | ␉␉# Check if this line of data appear on the same line␊ |
1153 | ␉␉# than the previous tag. If not, append an end of line␊ |
1154 | ␉␉# to the buffer.␊ |
1155 | ␉␉if ($ref ne $verb_last_ref) {␊ |
1156 | ␉␉ $buffer .= "\n";␊ |
1157 | ␉␉ $verb_last_ref = $ref;␊ |
1158 | ␉␉}␊ |
1159 | ␉␉$buffer .= "\n";␊ |
1160 | ␉ } elsif ($lastchar ne ' ') {␊ |
1161 | ␉␉$buffer .= " ";␊ |
1162 | ␉ }␊ |
1163 | ␉ $lastchar = ' ';␊ |
1164 | ␉} #end of type eq 're'␊ |
1165 | ␊ |
1166 | ␉elsif ($event->type eq 'conforming') {␊ |
1167 | ␊ |
1168 | ␉}␊ |
1169 | ␊ |
1170 | ␉elsif ($event->type eq 'pi') {␊ |
1171 | ␉ my $pi = $event->data;␊ |
1172 | ␉ $buffer .= "<?$pi>";␊ |
1173 | ␉}␊ |
1174 | ␊ |
1175 | ␉else {␊ |
1176 | ␉ die wrap_ref_mod($refs[$parse->line], "po4a::sgml",␊ |
1177 | ␉ dgettext("po4a","Unknown SGML event type: %s"),␊ |
1178 | ␉ $event->type);␊ |
1179 | ␉}␊ |
1180 | }␊ |
1181 | ␊ |
1182 | # What to do after parsing␊ |
1183 | $self->pushline($buffer);␊ |
1184 | close(IN);␊ |
1185 | warn wrap_mod("po4a::sgml",␊ |
1186 | dgettext("po4a","Warning: nsgmls produced some errors. ".␊ |
1187 | "This is usually caused by po4a, which modifies the input ".␊ |
1188 | "and restores it afterwards, causing the input of nsgmls ".␊ |
1189 | "to be invalid. This is usually safe, but you may wish ".␊ |
1190 | "to verify the generated document with nsgmls -wno-valid. ".␊ |
1191 | "Continuing..."))␊ |
1192 | if ($? != 0 and $self->verbose() > 0);␊ |
1193 | unlink ($tmpfile) unless ($debug{'refs'} or $debug{'nsgmls'});␊ |
1194 | }␊ |
1195 | ␊ |
1196 | sub end_paragraph {␊ |
1197 | my ($self, $para,$ref, $type,$verb,$indent)=␊ |
1198 | ␉(shift,shift,shift,shift,shift,shift);␊ |
1199 | my (@open)=@_;␊ |
1200 | die "Internal error: no paragraph to end here!!"␊ |
1201 | ␉unless scalar @open;␊ |
1202 | ␊ |
1203 | return unless defined($para) && length($para);␊ |
1204 | ␊ |
1205 | if (($para =~ m/^\s*$/s) and (not $verb)) {␊ |
1206 | ␉# In non-verbatim environments, a paragraph with only spaces is␊ |
1207 | ␉# like an empty paragraph␊ |
1208 | ␉return;␊ |
1209 | }␊ |
1210 | ␊ |
1211 | # unprotect &entities;␊ |
1212 | $para =~ s/{PO4A-amp}/&/g;␊ |
1213 | # remove the name"\|\|" nsgmls added as attributes␊ |
1214 | $para =~ s/ name=\"\\\|\\\|\"//g;␊ |
1215 | $para =~ s/ moreinfo=\"none\"//g;␊ |
1216 | ␊ |
1217 | # Extract the leading and trailing spaces. They will be restored only␊ |
1218 | # in verbatim environments.␊ |
1219 | my ($leading_spaces, $trailing_spaces) = ("", "");␊ |
1220 | if ($verb) {␊ |
1221 | ␉# In the verbatim mode, we can ignore empty lines, but not the␊ |
1222 | ␉# leading spaces or tabulations. Otherwise, the PO will look␊ |
1223 | ␉# weird.␊ |
1224 | ␉if ($para =~ m/^(\s*\n)(.*?)(\s*)$/s) {␊ |
1225 | ␉ $leading_spaces = $1;␊ |
1226 | ␉ $para = $2;␊ |
1227 | ␉ $trailing_spaces = $3;␊ |
1228 | ␉}␊ |
1229 | } else {␊ |
1230 | ␉if ($para =~ m/^(\s*)(.*?)(\s*)$/s) {␊ |
1231 | ␉ $leading_spaces = $1;␊ |
1232 | ␉ $para = $2;␊ |
1233 | ␉ $trailing_spaces = $3;␊ |
1234 | ␉}␊ |
1235 | }␊ |
1236 | ␊ |
1237 | $para = $self->translate($para,$ref,$type,␊ |
1238 | 'wrap' => ! $verb,␊ |
1239 | 'wrapcol' => (75 - $indent));␊ |
1240 | ␊ |
1241 | if ($verb) {␊ |
1242 | ␉$para = $leading_spaces.$para.$trailing_spaces;␊ |
1243 | } else {␊ |
1244 | ␉$para =~ s/^\s+//s;␊ |
1245 | ␉my $toadd=" " x ($indent+1);␊ |
1246 | ␉$para =~ s/^/$toadd/mg;␊ |
1247 | ␉$para .= "\n";␊ |
1248 | }␊ |
1249 | ␊ |
1250 | $self->pushline( $para );␊ |
1251 | }␊ |
1252 | ␊ |
1253 | 1;␊ |
1254 | ␊ |
1255 | =head1 AUTHORS␊ |
1256 | ␊ |
1257 | This module is an adapted version of sgmlspl (SGML postprocessor for the␊ |
1258 | SGMLS and NSGMLS parsers) which was:␊ |
1259 | ␊ |
1260 | Copyright (c) 1995 by David Megginson <dmeggins@aix1.uottawa.ca>␊ |
1261 | ␊ |
1262 | The adaptation for po4a was done by:␊ |
1263 | ␊ |
1264 | Denis Barbier <barbier@linuxfr.org>␊ |
1265 | Martin Quinson (mquinson#debian.org)␊ |
1266 | ␊ |
1267 | =head1 COPYRIGHT AND LICENSE␊ |
1268 | ␊ |
1269 | Copyright (c) 1995 by David Megginson <dmeggins@aix1.uottawa.ca>␊ |
1270 | Copyright 2002, 2003, 2004, 2005 by SPI, inc.␊ |
1271 | ␊ |
1272 | This program is free software; you may redistribute it and/or modify it␊ |
1273 | under the terms of GPL (see the COPYING file).␊ |
1274 | |