Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

We are sorry to see you leave - Beta is different and we value the time you took to try it out. Before you decide to go, please take a look at some value-adds for Beta and learn more about it. Thank you for reading Slashdot, and for making the site better!

Ask Slashdot: Best PDF Handling Library?

timothy posted about 3 months ago | from the when-vi-is-not-the-answer dept.

Open Source 132

New submitter Fotis Georgatos (3006465) writes I recently engaged in a conversation about handling PDF texts for a range of needs, such as creation, manipulation, merging, text extraction and searching, digital signing etc etc. A couple of potential picks popped up (PDFBox, itext), given some Java experience of the other fellows. And then comes the reality of choosing software as a long term knowledge investment! ideally, we would like to combine these features:

  • open source, with a community following ; the kind of stuff Slashdotters would prefer
  • tidy software architecture; simple things should remain simple
  • allow open API allowing usage across many languages (say: Python & Java)
  • clear licensing status, not estranging future commercial use
  • serious multilingual & font support
  • PDF-handling rich features, not limiting usage for invoicing, e-commerce, reports & data mining
  • digital signing should not go against other features

I'd like to poll the collective Slashdot crowd wisdom about if/which PDF related libraries, they have written software with, keeps them happy for *all* the above reasons. And if not happy with that all, what do they thing is the best bet for learning one piece of software in the area, with great reusability across different circumstances and little need for extra hacks? I'd really like to hear the smoked out war stories. It is easy to obtain a list of such libraries, yet tricky to understand whethe people have obtained success with them!

Sorry! There are no comments related to the filter you selected.

Why? (0)

Anonymous Coward | about 3 months ago | (#47624673)

open source, with a community following ;

Why?
So if you find one that fits every other requirement but this one you will refuse to use it?

Re: Why? (2, Insightful)

Anonymous Coward | about 3 months ago | (#47624755)

To be fair the OP does say "ideally."

Re:Why? (1)

Stumbles (602007) | about 3 months ago | (#47624767)

Because those are hisrequirements and there is nothing wrong with that but it is to you so don't use open source. See your problem is solved.

Re:Why? (0)

Anonymous Coward | about 3 months ago | (#47624769)

Indeed. I'm curious why is not "closed source, with a strong industry support" an option?

Re:Why? (2)

Wycliffe (116160) | about 3 months ago | (#47625543)

Indeed. I'm curious why is not "closed source, with a strong industry support" an option?

Because both "open source" and "strong industry support" when put together like that pretty
much means that they don't want to get stuck holding the bag if the company goes out of business.
With "strong industry support" the odds of a company going out of business is minimized and
with "open source" even if it does go out of business then you can still continue to use the
software indefinitely while you look for a replacement.

Re:Why? (3, Interesting)

Rob Y. (110975) | about 3 months ago | (#47626179)

I'm using a non-free, but source-provided library called Clib-PDF. It's a pretty nice library with a pretty easy API, and even has PHP bindings (so it must've been a viable mainstream choice at one point). But somehow the company (or was it just a single guy) disappeared years ago. Luckily, we paid for and got the source, and I've been able to keep using it (and even fixing things in the source) without any ongoing support. So not quite open source, but not quite the disaster of discontinued closed source.

I suspect that the author of this library sold it to one of the commercial companies who proceeded to shut down a viable competitor. But who knows...

Re:Why? (0)

znrt (2424692) | about 3 months ago | (#47626323)

Indeed. I'm curious why is not "closed source, with a strong industry support" an option?

i guess because he knows what kind of crap *that* can be.

of course his full requirement list is ridiculous, nevertheless as a request to the community on a public forum. anything else? this dude is just looking for someone else wanting to do his fucking job, but he also wants a medal for it (read: a tap on the shoulder). likely he *is* in a "closed source, with a strong industry support" environment, so screw him.

Re:Why? (0)

Anonymous Coward | about 3 months ago | (#47624791)

Some people have principles.

Re:Why? (1, Insightful)

Anonymous Coward | about 3 months ago | (#47624897)

Yes ... and their principles are; I want somebody else to do a lot of hard work and I want the benefit for free. Oh, and I don't even want to research this myself, I want others to do the work ... for free.

Re:Why? (0)

Anonymous Coward | about 3 months ago | (#47625067)

Yes ... and their principles are

Followed by straw men.

Their principles are that the source should be available and that people should have certain freedoms so that no one is beholden to a specific company for updates or modifications, and they can release their changes to the code so that everyone else can benefit too.

Want to try your hand at writing another nonsensical troll post?

Re:Why? (1)

LesFerg (452838) | about 3 months ago | (#47627809)

Oh, and I don't even want to research this myself, I want others to do the work ... for free.

Thats a bit unfair. I had a need to convert some output into PDF a while back and started looking thru the many proprietary and open source options, and there were a lot to choose from. It was awfully hard to determine which were quality and which weren't. Installing and trying to program a working solution against each API would have taken up a huge amount of time. There is nothing at all wrong with asking if others have been thru some of that process and found a favorite. I certainly would have liked to see some comparisons or good product reviews to help me decide.

Also, one thing I started to suspect; a number of the supposedly proprietary PDF tools and API's that I investigated appeared to just be a GUI and some wrappings around a GPL PDF library, usually Ghostscript, with little or no attribution given. Once again, there is nothing wrong with asking for other peoples experience with these tools and libraries; you don't want to find out you have breached GPL licensing AFTER you have created your solution, do you?

Re:Why? (1, Funny)

Bill, Shooter of Bul (629286) | about 3 months ago | (#47625209)

Well, he's dealing with PDFs, so he doesn't have *strong* principles.

Re:Why? (1)

nine-times (778537) | about 3 months ago | (#47625847)

PDF isn't so bad, is it? It's an open standard with open source implementations, with open/royalty-free licensing for the patents involved. Or am I missing something?

Re:Why? (0)

Anonymous Coward | about 3 months ago | (#47626023)

Which gets used incorrectly most of the time. Specially if you need to extract data from it

Re:Why? (1)

wonkey_monkey (2592601) | about 3 months ago | (#47626083)

Specially if you need to extract data from it

That's not what it's for, is it? Except by the method of reading, with your eyes.

Re:Why? (2)

NJRoadfan (1254248) | about 3 months ago | (#47627575)

Adobe pushes PDF as a method of data collection. People make fancy PDF forms and e-mail them out. Inside of the form is a button that says "when complete, click here to submit form" which attaches the filled out form to an e-mail and sends it back to the publisher. From there, folks somehow extract the fields from the file and dump it into a database, which seems like a messy and complicated process. Honestly a web form would be easier to implement in many cases.

Re:Why? (1)

burne (686114) | about 3 months ago | (#47625343)

Other people have politicians that require you to use open source if available.

Re:Why? (0)

Anonymous Coward | about 3 months ago | (#47625501)

Some people have principles.

at least as long as they're getting schooled...

Re:Why? (2)

Chrisq (894406) | about 3 months ago | (#47626223)

Some people have principles.

He could be working on an open source project

Re:Why? (2)

kelemvor4 (1980226) | about 3 months ago | (#47625355)

open source, with a community following ;

Why? So if you find one that fits every other requirement but this one you will refuse to use it?

Derp,

Probably because if there is no community following it there is not going to be much in the way of development going on.

Re:Why? (1)

Animats (122034) | about 3 months ago | (#47625727)

Probably because if there is no community following it there is not going to be much in the way of development going on.

Right. With mid-tier open source projects, there's a good chance they're either unfinished or abandonware. (Lower-tier open source projects are both.) There's only so much attention available.

Yeah! Why would anyone want it maintained? (5, Insightful)

Anonymous Coward | about 3 months ago | (#47626151)

open source, with a community following

Why?

Because maybe it's not his first project? Fine, let me ask you: how many times did you get burned by totally unmaintainable third-party dependencies, before you vowed "NEVER AGAIN will I get so utterly fucked over?"

Was your fifth project the one where you couldn't ever port to a new architecture or OS, or was it the one where the only company who had the source, went into bankruptcy and it took years for the liquidation to happen and you never really figured out where the assets are? No wait, your fifth project was the one where they just withdrew it from the market for "strategic reasons" and you never found out why and there was no replacement. Ah, then there was the race condition that you knew you could find if only you could read through the code, but the sole developer didn't even know what "race condition" means so he ignored your bug report. And the time the DRM server incorectly said the API key had expired so you didn't get any sales that day. Then there was that time you had the source but weren't allowed to change some parts of it: I loved the comment "by reading this you are violating the License Agreement" followed by the base64 string of dynamically interpreted code. Of course you violated the agreement, and decoded it: finding a bug you weren't allowed to fix. And of course let's not forget the time the developer might have actually hypothetically allowed the code to be maintained or might have even done it himself, but he had lost it, the one and only copy in the entire world, which had been used to compile the code that literally tens of thousands of people were depending on. That one's a classic, almost right up there with the vendor who died, taking all his customers' hopes of maintenance with him to the grave.

Holy crap. I get why the public doesn't know to demand Free Software. Even smart people can be uninformed or lack expertise outside their areas. But developers, really? You have to be LITERALLY STUPID to not see "open source" as at least a major advantage, if not necessarily always the winner. Maybe it's not always a solid requirement, but if you don't always at least start your searches that way and try to get something that at least can be maintained, then yes, you're a moron.

"Oh no, I'm not a moron," you explain, "I just happen to think that some large projects aren't ever going to need maintenance, because surely it's simple enought that a good programmer will get everything right the first time." You're right: you're not a moron; you're an imbecil. Sorry about the mistake.

Re:Yeah! Why would anyone want it maintained? (2)

Dragon Bait (997809) | about 3 months ago | (#47627149)

Because maybe it's not his first project? Fine, let me ask you: how many times did you get burned by totally unmaintainable third-party dependencies, before you vowed "NEVER AGAIN will I get so utterly fucked over?"

This. Wish I hadn't run out of mod points -- and frankly I'm tired of some bottom of the barrel programmer who's attitude is "we can just rewrite everything every 5 years" get promoted into management and then tie our code to whatever proprietary crap the next cute sales person brings.

Separate. Isolate. Defend. Treat every piece of third-party code that you don't have source for as an enemy whose only goal is to financially rape you. I don't care if that enemy goes by Oracle, Microsoft, or Joe's Discount Software.

ReportLab (4, Informative)

Anonymous Coward | about 3 months ago | (#47624691)

Python only, but I've used it successfully.

PDFLib (2, Insightful)

Anonymous Coward | about 3 months ago | (#47624737)

Well, there's PDFLib, which is hideously expensive and not open source, but if you're after a professional package for serious purposes that just works I can only recommend it.

Re:PDFLib (4, Informative)

Anonymous Coward | about 3 months ago | (#47624799)

PDFlib is cheap compared to licensing Adobe's libraries from DataLogics. (speaking as one who switched from the latter to the former).... A full source license for pdflib and tetlib were much less that Adobe/DataLogics non-source license... less than 1 FTE. Then again, your milage may vary.

PDFLib happens to be the cleanest and best PDF code solution I've ever worked with.

Re:PDFLib (2)

Skuld-Chan (302449) | about 3 months ago | (#47626853)

You know whats funny is PDFLib is what Adobe calls the set of core tech libraries to generate PDF files inside their own apps. (source: used to work at Adobe - on Acrobat no less)

Re:PDFLib (0)

Anonymous Coward | about 3 months ago | (#47625097)

Can it tell me if a chunk of text will fit on the current page with the current font (what is the height of "this text" after wrapping it to X width)? It seems like every PDF library I've ever used has failed to be able to do this one simple thing. The best I've been able to do is to get character widths of each character individually, add them up until I reach the right margin, back up to the previous space (hoping that the library won't hyphenate), add one to the line counter, and continue. Close enough for government work but damn it's a HUGE pain.

Re:PDFLib (0)

Anonymous Coward | about 3 months ago | (#47627871)

Yes. There are various examples like this one: http://www.pdflib.com/en/pdflib-cookbook/text-output/starter-textflow/

If you want to control it yourself it is possible to define and add the textflow (telling it to not error if it overflows), query it for the size, then delete it if it overflows. You can then move elsewhere to try again. Not an expert so there may be some mistakes or simplifications there but it's basically what our system does when it wants to keep text together in a block.

Re:PDFLib (2)

DJ Jones (997846) | about 3 months ago | (#47625341)

TCPDF [tcpdf.org] Open-source PDF-reader built in PHP
FPDF [fpdf.org] Combine with TCPDF above to create a PDF-writer using PHP
SetAssign [setasign.com] Not open-source but this company offers both free and paid libraries that combine with the libraries above to allow PDF encryption / decryption using PHP.- The paid versions support more complex ciphers and I swear by them personally


Not sure if you meant desktop software or...

pdf.js (5, Informative)

AntiTuX (202333) | about 3 months ago | (#47624771)

pdf.js is great for parsing and manipulating pdfs.

I can't go into great detail as to how I've used it (Still under NDA), but it's rendering and manipulating of pdfs is pretty darn good.

As for converting office formats to pdf, your best bet is to use office automation. It can be built to scale up, but it needs a lot of work to do so.

Flying Saucer (0)

Anonymous Coward | about 3 months ago | (#47625043)

Why deal with hard coding libraries for PDF rendering?
With Flying Saucer you can make XHTML and CSS 2.1(with some 3 features), and then you don't have to deal with the hard coding of the report.
Plus you can generate the report on the web right to your client.
I have using Flying Saucer for over a year, and has real made creating PDF reports a lot easier.
Before finding Flying Saucer I did research on JasperReport and Birt.
And Flying Saucer was the only one that gave a 98% rendering of TinyMCE HTML tags.
http://code.google.com/p/flying-saucer/

Re:Flying Saucer (0)

Anonymous Coward | about 3 months ago | (#47626289)

How well does it work over email? Most HTML clients suck at rendering even basic HTML, let alone handle CSS properly, and I see you've thrown an X in there. On the other hand I can just render a document or report to PDF and be confident that the recipient can actually view it in all its intended glory.

Re:pdf.js (2)

PhrostyMcByte (589271) | about 3 months ago | (#47625093)

Office Automation is problematic -- because it literally opens up a hidden window of your Office app and simulates clicking around the UI to do what you need, if something unexpected happens it can unhide the window to show the user a message. This might be good enough for a desktop app, but if you're running it on a server it'll just freeze up your process with noone there to click it.

For Office->PDF conversion of word docs, Aspose.Words has a fairly easy API and generally very accurate rendering. I highly recommend it.

Re:pdf.js (3, Informative)

Auction_God (1056590) | about 3 months ago | (#47625373)

No...it does not simulate clicking. It uses the underlying COM representation to perform its functions. That said, it does not work well in a multi-threaded environment, nor where you can't setup user (e.g. restricted web server credentials). So you either have to impersonate or use COM+ configuration to run the office tool under a different user name. So if you're just starting out...do not use Office automation in a server environment unless you're willing to deal with these issues. Try Aspose as suggested instead.

Re:pdf.js (2)

Jaime2 (824950) | about 3 months ago | (#47625663)

I wouldn't recommend Office Automation on a server if there is any alternative. For beginners, there's too many gotchas and for advanced users, there's plenty of alternatives that will do what you want without too much difficulty. Office with .Net is especially problematic because the COM components run as out-of-process servers and due to .Net's garbage collection and COM interoperability, they are difficult to get to shut down properly.

I've found these tools useful (5, Informative)

jab (9153) | about 3 months ago | (#47624797)

I've found these tools useful, with an honorable mention to gnupdf. I've never used it personally, but the code looks pretty solid. That said, when I really needed to produce great multilingual PDF I pulled out the PDF spec, gritted my teeth, and generated it directly.

leptonica - turn images into PDF
tesseract - turn images into searchable PDF
qpdf - linearize PDF for random access over HTTP
jhove - basic validation
jhove-pdf-a - validation with better compatibility guarantees
pdftk - command line tool for splicing pages together or apart
ttx/FontTools - tool for modifying custom fonts
reportlab - python library, easy to use but works best with Latin scripts

Re:I've found these tools useful (0)

Anonymous Coward | about 3 months ago | (#47625063)

I'll 2nd pdftk. It can generate an FDF from a PDF with an embedded form, you can subsequently do form filling. You can also do stamping if you want a watermark or a tag line at the bottom of each page.

There are some related command line tools that work well with it, like pdfjam (re-sizing to A4 or LETTER) pdfjoin to do merging.

One thing pdftk CANNOT do is unicode during form filling. So you are stuck with ascii and some control characters there.

Re:I've found these tools useful (0)

Anonymous Coward | about 3 months ago | (#47625177)

I've used pdftk, too, but I believe it is using itext as the underlying library. Also using itext would be ColdFusion.

Re:I've found these tools useful (1)

Daniel Hoffmann (2902427) | about 3 months ago | (#47625223)

You know any that can convert simple HTML with simple CSS into pdfs? Preferably with support for dataURL in image tags?

Re:I've found these tools useful (3, Interesting)

Anonymous Coward | about 3 months ago | (#47625335)

sudo apt-get install wkhtmltopdf
wkhtmltopdf www.google.com google.com.pdf

Re:I've found these tools useful (1)

Phoenix Rising (28955) | about 3 months ago | (#47625595)

Great tool for HTML conversion. Doesn't meet the OP's criteria, but it's the best open source HTML to PDF converter I've found.

Re:I've found these tools useful (2)

Qzukk (229616) | about 3 months ago | (#47625357)

I have no idea if it supports data: URIs but I've used HTMLDOC to turn html tables into PDF (since every PDF library I've ever used is absolutely shit at tables compared to HTML). It supports inline styles and <style type="text/css"> tags. It's not quite dead, but this year's update was the first since 2006 [msweet.org] .

Re:I've found these tools useful (3, Interesting)

kbg (241421) | about 3 months ago | (#47625815)

Yes the Flying Saucer [google.com] Java library. It is one of the best XHTML to PDF converstion tool.

Re:I've found these tools useful (1)

Daniel Hoffmann (2902427) | about 3 months ago | (#47626215)

This library uses iText, unfortunately my projects are closed source and the budget for libraries is zero.

Re:I've found these tools useful (1)

udittmer (89588) | about 3 months ago | (#47628271)

FlyingSaucer uses iText 2, which is licensed under the LGPL - that might maybe make it OK (and free) to use?

I've found these tools useful (0)

Anonymous Coward | about 3 months ago | (#47625677)

I'd avoid pdftk. It uses a fork of the iText library that is buggy and old and apparently no longer actively maintained. A secondary issue is that it has terrible error handling; when iText reports an error, pdftk just throws away the information about what the error really is. Qpdf can do everything I used to do with pdftk, and is much better engineered and supported.

Re:I've found these tools useful (1)

jab (9153) | about 3 months ago | (#47626453)

I forgot to mention PDF rendering engines. Neither support the entire spec, nor do I blame them.

poppler - widely used
pdfium - high performance, recently open sourced

Re:I've found these tools useful (0)

Anonymous Coward | about 3 months ago | (#47627983)

RTL scripts, like Arabic in PDF? Are we there yet? It's not a niche thing in 2014 to want to use Arabic in a FLOSS application

IText (1)

Anonymous Coward | about 3 months ago | (#47624801)

You might try IText:
http://en.wikipedia.org/wiki/IText

pdftk uses it.

Re:IText (1)

mark-t (151149) | about 3 months ago | (#47624861)

I would have posted to make the same recommendation if someone else hadn't already mentioned it, so I'll just follow up with another recommendation for itext. It's a pretty easy to use library, and it's been around for a while, so it's pretty stable.

Re:IText (1)

Fotis Georgatos (3006465) | about 3 months ago | (#47625403)

Hi mark-t, OP here.

itext appears to miss two-three major targets:
  • software architecture/api may not be too tidy; this is word of others, I cannot verify it since I've never used it
  • unclear licensing scheme, not free of commercial clashes
  • due to the above, community has been fragmented in 3 (?) big fragments :-(

Re:IText (1)

Stalus (646102) | about 3 months ago | (#47626265)

The problem with iText is that it used to be MPL, but the maintainer got ticked off at commercial users several years ago and changed to license to AGPL. Apparently now they're relaxing the license for a fee, but they've changed their mind before - no guarantees that they won't change it again.

to the free as in "all you can steal" crowd (-1, Troll)

NemoinSpace (1118137) | about 3 months ago | (#47624823)

Just get over it and use Adobe. The thousands wasted by trying to save a buck just doesn't make sense. And the quality of your faith based software is not made up for by the quantity. I know this seems like flame bait to those of you that don't actually have the need for software that works, so don't take offense at being marginalized.

Re:to the free as in "all you can steal" crowd (1)

bluefoxlucid (723572) | about 3 months ago | (#47624891)

libpoppler works; it just only meets the requirements that libpoppler was designed for. It correctly displays most PDFs, but fails with esoteric features used only in a small subset.

In that sense, libpoppler is like a swiffer mop: it handles most normal dirt, dust, and general cleaning needs for tile and hardwood; but you will need a mop, or potentially nylon or bristle scrubbers and power tools, to clean some deep-set grime from linoleum or porcelain tile. I've had mops fail to clean traffic grime from kitchen linoleum, at all; stuck a drill brush in a 3000RPM 600W output cordless drill and blasted that shit right off.

Re:to the free as in "all you can steal" crowd (2)

PopeRatzo (965947) | about 3 months ago | (#47625437)

I use a cordless drill and brush bit to make my PDF files, too. It's slow, and it doesn't really scale well, but at least it's not from Adobe.

Prince XML (2)

Eponymous Coward (6097) | about 3 months ago | (#47624827)

The best PDF software I've ever used is Prince XML.

For years, we got by with HTMLDoc but finally dumped it because we absolutely needed unicode support.

After trying many different packages, we settled on Prince. Our main constraints were performance related which you apparently aren't worried about, so maybe it's overkill for what you need.

Re:Prince XML (1)

Elgonn (921934) | about 3 months ago | (#47628075)

I'll have to second the recommendation for PrinceXml.

After investigating and trying at least 9 other open source kits I eventually gave up and went with PrinceXml. You can try the 'trial' version easily and it just works easily. Their support is actually good as well. I wish there was a good pdf toolkit that was open source. But they all seem to just do one odd piece of the puzzle poorly.

"Slashdot Crowd Wisdom" ! (5, Funny)

RobotRunAmok (595286) | about 3 months ago | (#47624901)

See, now thanks to you I have to clean all this coffee off my monitor...

Re:"Slashdot Crowd Wisdom" ! (1)

gstoddart (321705) | about 3 months ago | (#47625215)

See, now thanks to you I have to clean all this coffee off my monitor...

You should post a question to the Ask section, maybe some of us have some tips for cleaning it off? ;-)

Re:"Slashdot Crowd Wisdom" ! (3, Funny)

jones_supa (887896) | about 3 months ago | (#47625281)

I'm sure one of the requirements would be an open source napkin.

Re:"Slashdot Crowd Wisdom" ! (2)

funwithBSD (245349) | about 3 months ago | (#47625393)

I suggest librag.

Re:"Slashdot Crowd Wisdom" ! (0)

Anonymous Coward | about 3 months ago | (#47626633)

Well, librag implements a "sanitary napkin" API. While you could use it for absorbing coffee on the monitor, it's actually intended to absorb something very different in a very different area...

- T

Re:"Slashdot Crowd Wisdom" ! (1)

PopeRatzo (965947) | about 3 months ago | (#47625469)

maybe some of us have some tips for cleaning it off? ;-)

I stick a mop head into a 600W cordless drill. Be sure to weak safety glasses.

It does a lousy job of cleaning coffee off your monitor, but your co-workers will think you're a total badass.

And like jones_supa says, make sure it's an open source mop. And post photos, please.

PDFBox (0)

Anonymous Coward | about 3 months ago | (#47624945)

It works.

Alternative to iText (0)

Anonymous Coward | about 3 months ago | (#47624961)

It seems iText is it, can anyone vouch for a good alternative

Re:Alternative to iText (2)

Daniel Hoffmann (2902427) | about 3 months ago | (#47625173)

iText has some problems with licensing for commercial applications and government projects. I am looking for an alternative.

For serious archiving. (1, Interesting)

Anonymous Coward | about 3 months ago | (#47624989)

Make this becomes a requirement: support for making PDF/A.

I'm convinced there is no elegant PDF library (4, Informative)

PhrostyMcByte (589271) | about 3 months ago | (#47625017)

At least on the C# side of things, the three libraries I've used (iTextSharp, PdfSharp, and Aspose.Pdf) are all a bit of an unintuitive mess with inconsistencies all over the place and very little documentation. In the case of iText, their revenue stream is putting all their documentation into a book for people to buy, so it's not uncommon to get an intentionally vague response when asking for help.

I cycle between each depending on what I need to do, because they all have their own quirks and supported features. I've even piped from one to another to get certain parts of the process working.

Good luck.

Re:I'm convinced there is no elegant PDF library (2)

codemachine (245871) | about 3 months ago | (#47625593)

It could be that iText is just what he needs though. iTextSharp is the C# port of the original iText Java library. At times, it is easier to find code examples for iText than iTextSharp. Since the iTextSharp folks did their best to use C# conventions, the Java call names aren't always the same as the C# ones.

I'm convinced there is no elegant PDF library (0)

Anonymous Coward | about 3 months ago | (#47625761)

Have you looked at EssentialObjects? We use it for one of our internal applications and its been very easy to work with.

Re:I'm convinced there is no elegant PDF library (0)

Anonymous Coward | about 3 months ago | (#47626915)

ActivePDF was pretty good IMO.I just use iTextSharp and haven't ran into any difficulties.

try... (4, Informative)

Anonymous Coward | about 3 months ago | (#47625057)

sudo apt-get install ghostscript pdftk poppler-utils

ghostscript: /usr/bin/dvipdf
ghostscript: /usr/bin/pdf2dsc
ghostscript: /usr/bin/pdf2ps
ghostscript: /usr/bin/pdfopt
ghostscript: /usr/bin/ps2pdf
pdftk: /usr/bin/pdftk
poppler-utils: /usr/bin/pdffonts
poppler-utils: /usr/bin/pdfimages
poppler-utils: /usr/bin/pdfinfo
poppler-utils: /usr/bin/pdfseparate
poppler-utils: /usr/bin/pdftocairo
poppler-utils: /usr/bin/pdftohtml
poppler-utils: /usr/bin/pdftoppm
poppler-utils: /usr/bin/pdftops
poppler-utils: /usr/bin/pdftotext
poppler-utils: /usr/bin/pdfunite

Re:try... (1)

Em Adespoton (792954) | about 3 months ago | (#47625227)

This. These three are what you need; you can then script a wrapper around them if you need to, but they'll provide you with everything you need as far as actual manipulation and display goes. Poppler keeps it simple, pdftk can handle most manipulation needs, and ghostscript is there to covere any esoteric issues that still fall under postscript/pdf/EPS.

Might want to also include imagemagick, for import/export/optimization of most image formats you might be bursting/adding in PDF.

Re:try... (1)

kriston (7886) | about 3 months ago | (#47625567)

Absolutely correct. I can't think of any good reasons not to use Poppler [wikipedia.org] .

Ghostscript,Foxit PDF or other virtual PDF printer (1)

QuickBible (1143641) | about 3 months ago | (#47625211)

My favorite solution to this problem is to leverage a 3rd party solution to do the heavy lifting that creates the actual PDF document and tie into it with printing code. When you setup the printer device and print mode correctly you can render your output in a print document as you would to the screen. The only changes are in calculating DPI because the screen and printer will have different dots per inch. I have done this on Windows using 2 techniques. The first was long ago and that was to use an installed post-script printer driver to "print to a file" and then have ghost script convert PS to PDF. As far as I know this is still a free and open solution. The makers of Ghostscript only clause to freedom of usage is that the software can't be for commercial usage. The second manner is to have your software look for a virtual PDF printer driver and "print to file". When you install Foxit PDF it installs a virtual PDF printer driver that can be used by code.

Re:Ghostscript,Foxit PDF or other virtual PDF prin (1)

NJRoadfan (1254248) | about 3 months ago | (#47627637)

I've seen commercial programs actually do this to support PDF report generation. They just leverage the existing code they have for printing reports and redirect it to a virtual printer. I think it was the Amyuni libraries which are clearly closed source. One thing I can say is that a virtual printer that directly generates PDF files from the GDI output (we're talking Windows here) tends to create cleaner output files (smaller size, less rendering errors) than the Postscript printer output to PDF route.

Buile a xelatex file and then compile ... (0)

Anonymous Coward | about 3 months ago | (#47625219)

Buile a xelatex file and then compile ...

We tested different things here and we ended up generating xelatex files and compiling them as needed (from Java).

Re:Buile a xelatex file and then compile ... (0)

Anonymous Coward | about 3 months ago | (#47626751)

Personally I'm still using pdflatex, but yes, bascially this. Any kind of tex to create documents beats whatever else there is.

Although it doesn't necessarily fulfil the "simple things should be simple" requirement.

WKHTMLTOPDF (0)

Anonymous Coward | about 3 months ago | (#47625243)

(Pretty much, with minimal issues) pixel perfect generation of a PDF file from HTML.

Libreoffice or Openoffice (0)

Anonymous Coward | about 3 months ago | (#47625269)

IIRC they import and export pdf.

None of them support layers (0)

Anonymous Coward | about 3 months ago | (#47625293)

I know this is about as edge case as it gets but non-native PDF handling completely ignores PDF layering.

ICEpdf (0)

Anonymous Coward | about 3 months ago | (#47625499)

Just wanted to mention the Author forgot to evaluate ICEpdf.

ICEpdf can be used as standalone open source Java PDF viewer, or can be easily embedded in any Java application to seamlessly load or capture PDF documents. Beyond PDF document rendering, ICEpdf is extremely versatile, and can be used in a multitude of innovative ways, including: image conversion, annotation editing tools, text/image extraction, search and printing.

http://www.icesoft.org/java/projects/ICEpdf/

here is one (1)

roman_mir (125474) | about 3 months ago | (#47625513)

itext [itextpdf.com] may be one of those. It comes under AGPL and under a commercial license if you buy support from the company.

One word: PDFLib (5, Informative)

Qbertino (265505) | about 3 months ago | (#47625671)

PDFLib GmbH [pdflib.com] (german LLC) build exactly one product: PDFLib. And they've been doing that since 1997. AFAIK the company was run by one guy - the initial developer - alone for most of the time. Now it's probably a shop of 5 or so.

So it's not FOSS - yeah, that's a real shame. But the devs get to eat, you can demand service and response if you run into a bug and you can expect a good product and with PDFLib you're probably going to get it too.

I haven't come across a single project doing non-trivial PDF stuff that doesn't use PDFLib. I've used it myself a little, and the cookbook that comes with the product was very good, so it comes recommended.

My 2 cents.

Re:One word: PDFLib (1)

danheskett (178529) | about 3 months ago | (#47626047)

I second PDFLib.

I've created approximately 3 billion pages of PDF with it, since 2000. Very, very well done. The library is well thought out, and it can work even with bindings to languages that you would not think are usable. It's fast, really has a nice scope model, has a nice consistency, and rounds off the edges of PDF better than anything else with it.

If you come it with their import library, and pcos library, it can do almost everything you want. The developers are helpful and don't mess around. If you find a bug they fix it. Plus their documentation is about ten times better than anything else out there I've seen. They have a real reference and a cookbook that covers common scenarios.

Re:One word: PDFLib (0)

Anonymous Coward | about 3 months ago | (#47626817)

The trouble I can see with PDFLib is the stupid "per machine" licensing. Per machine licensing for a software library is ridiculous - the description of the license on their website pretty much rules out using it in any situation other than some sort of central PDF processing behemoth service.

Re:One word: PDFLib (0)

Anonymous Coward | about 3 months ago | (#47626927)

Dead on, their licensing sucks balls.

Re:One word: PDFLib (1)

rsborg (111459) | about 3 months ago | (#47628347)

The trouble I can see with PDFLib is the stupid "per machine" licensing. Per machine licensing for a software library is ridiculous - the description of the license on their website pretty much rules out using it in any situation other than some sort of central PDF processing behemoth service.

Clearly you haven't dealt with Oracle's licensing, compared to that, PDFLib is highly liberal stuff. They even give you one machine free (dev box).

BFO (2)

the_mice (163224) | about 3 months ago | (#47625797)

Java-only. Commercial. Feature-rich. Great company name. [faceless.org]

YUO FAIL IT (-1)

Anonymous Coward | about 3 months ago | (#47625835)

sanctions, and feel obligated to frreBSD at about 80 corporations

BFO (0)

Anonymous Coward | about 3 months ago | (#47626231)

Big Faceless Org
Not open source. Java only.
Great friendly support

PDF Clown (1)

gabrieltss (64078) | about 3 months ago | (#47626555)

We used PDFClown [stefanochizzolini.it] (open source - Java) for production applications that were generating PDF's on the fly with hundreds of pages using Java and it performed very well.

iText (1)

jbolden (176878) | about 3 months ago | (#47626661)

iText meets some of your criteria.

* open source, with a community following ; the kind of stuff Slashdotters would prefer = yep. Including several commercial books
* tidy software architecture; simple things should remain simple = It's big and complete.
* allow open API allowing usage across many languages (say: Python & Java) = Native Java. iText# is a port to .NET. Not a good fir for dynamic languages.
* clear licensing status, not estranging future commercial use = AGPL + commercial license. Clear but not free as in beer for commercial.
* serious multilingual & font support = yep.
* PDF-handling rich features, not limiting usage for invoicing, e-commerce, reports & data mining = yep
* digital signing should not go against other features = has several singing modules. Excellent.

WeasyPrint (2)

t4k1s (706242) | about 3 months ago | (#47626669)

I started with Reportlab (the open source parts), found it to low level so I considered using the commercial edition because it has a templating language. As I was not very fond of investing time in learning yet another templating language, I reconsidered, and gave HTML with CSS a try for printing. I used wkhtmlpdf for a while but switched to WeasyPrint in the end: it was created for using HTML with CSS for printing, seemed to be more actively developed when compared to wkhtmlpdf.

MuPDF? (1)

HenryPing (3778411) | about 3 months ago | (#47626891)

Does anyone have experience with MuPDF? http://www.mupdf.com/ [mupdf.com] It's open-source, but requires license for commercial use. It appears to offer the best performance and portability. Its top level application(MuPDF) is highly rated across most platforms: Google Play, Apple Store and Ubuntu.

HMRC's CT600 form - PDF forms (1)

Richard_J_N (631241) | about 3 months ago | (#47627003)

Is there anything that can handle the gruesome CT600 forms that the UK Tax authority require us to fill in every year? These have lots of embedded scripting and can only be read with Acrobat Reader. However, this year, Adobe have stopped releasing Acrobat for Linux.

(An added bonus, the internal logic of the CT600 is buggy: for example if a particular tax option does not apply, it is fussy about the distinction of 0 vs empty, and this leads to subsequent validation errors (naturally with confusing messages). It also has about 20 pages of irrelevant data required, in order to reach a single number, which we have already calculated.)

PrinceXML (1)

William Hertling (3778493) | about 3 months ago | (#47627869)

I needed to layout a novel in a PDF. I've previously worked with iText and prefer not to construct a PDF one element at a time. I wanted an HTML to PDF workflow. I then tried wkhtmltopdf, but it doesn't support most of the hardcore design needs: hyphenation, widow/orphan control, alternating page margins, and page headers and footers based on the section of a document. PrinceXML supports all that. Writing only CSS, and based on html content, you'll be able to replicate anything a designer can do in InDesign. It's incredibly powerful and the time it saves in development effort more than pays for the high cost of the software.

FOP? (2)

sgrover (1167171) | about 3 months ago | (#47628221)

Not sure how current it is, but when I was looking for the same a few years back all that was really available for PHP was HTML->PDF libraries which were not sufficient for anything but the most basic forms. A decent invoice form was hard to get right with these tools. Then I came across FOP. Or more specifically XML-FOP. Combine that with a little XSL and the output was amazing, and could do more than the HTML converters. The only problem is that the FOP tool was a Java based program so PHP would need to execute a shell command to call it. With tight control of what info was passed to that shell command, it seemed an appropriate trade-off for the job at hand. You can still get FOP in the ubuntu repos - apt-get install fop. The learning curve for FOP is a little steep to begin, but no more than any other XML dialect. And being XML, you have a lot of options in building the required FOP file. I opted to put my data into my own XML file, then utilize an XSL file to convert it if/when needed. More details here: http://xmlgraphics.apache.org/... [apache.org]

DynaPDF? (0)

Anonymous Coward | about 3 months ago | (#47628307)

We've used this in our application for several years to generate a huge number of PDFs for digital presses. It can also do extraction and rendering as well as generation.

https://www.dynaforms.com/

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?