For questions regarding the Hadoop distributed file system (HDFS) which is part of the Apache Hadoop project.
Questions tagged [hdfs]
73 questions
13
votes
4 answers
In Hadoop, how to show current process of -copyFromLocal
I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file.
I used -copyFromLocal to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current…
Bang Dao
- 233
- 2
- 6
7
votes
2 answers
HBASE Space Used Started Climbing Rapidly
Update 4,215:
After looking at space usage inside of hdfs, I see that .oldlogs is using a lot of space:
1485820612766 /hbase/.oldlogs
So new questions:
What is it?
How do I clean it up?
How do I keep it from growing again
What caused it to…
Kyle Brandt
- 81,077
- 70
- 299
- 442
7
votes
2 answers
Hadoop HDFS Backup & DR Strategy
We are preparing to implement our first Hadoop cluster. As such we are starting out small with a four node setup. (1 master node, and 3 worker nodes) Each node will have 6TB of storage. (6 x 1TB disks) We went with a SuperMicro 4-node chassis so…
Matt Keller
- 221
- 4
- 7
6
votes
2 answers
Hadoop HDFS: set file block size from commandline?
I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks.
I've done this before within a…
BigChief
- 398
- 1
- 2
- 12
5
votes
1 answer
Forward-sync to HDFS? (OR continue an incomplete hdfs upload?)
Anyone have a good suggestion for doing a forward sync to HDFS? ("forward-sync" in contrast to "bi-directional sync")
Basically I have a large number of files I want to put into the HDFS. Its so large that I'll often, say, lose connectivity before…
Nate Murray
- 963
- 1
- 7
- 7
5
votes
2 answers
How to fix Hadoop HDFS cluster with missing blocks after one node was reinstalled?
I have a 5 slave Hadoop cluster (using CDH4)---slaves are where DataNode and TaskNode run. Each slave has 4 partitions dedicated to HDFS storage. One of the slaves needed a reinstall and this caused one of the HDFS partitions to be lost. At this…
Dolan Antenucci
- 329
- 1
- 4
- 16
5
votes
1 answer
Ceph: Why is a greater number of "placement groups" a "bad thing"?
I have been researching distributed databases and file systems, and while I was originally mostly interested in Hadoop/HBase because I'm a Java programmer, I found this very interesting document about Ceph, which as a major plus point, is now…
monster
- 608
- 1
- 10
- 17
4
votes
1 answer
mount.nfs: mount system call failed
I am trying to mount hdfs on my local machine running Ubuntu using the following command :---
sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/
But I am getting this error:-
mount.nfs: mount system call failed
Output…
Bhavya Jain
- 141
- 1
- 1
- 3
4
votes
1 answer
Upload large files with curl without RAM cache.
I'm using curl to upload large files (from 5 to 20Gb) to HOOP based on HDFS (Hadoop Cluster) as follows:
curl -f --data-binary "@$file" "$HOOP_HOST$UPLOAD_PATH?user.name=$HOOP_USER&op=create"
But when curl uploading large file it trying to fully…
Gening D.
- 71
- 1
- 5
4
votes
3 answers
Is there a way to grep gzipped content in hdfs without extracting it?
I'm looking for a way to zgrep hdfs files
something like:
hadoop fs -zcat hdfs://myfile.gz | grep "hi"
or
hadoop fs -cat hdfs://myfile.gz | zgrep "hi"
it does not really work for me is there anyway to achieve that with command line?
Jas
- 681
- 3
- 12
- 23
4
votes
0 answers
java.lang.NullPointerException When Doing A Read in HDFS
I have had a 10 node HBase cluster up and running for the past 4 months. The cluster was setup on VMs in a corporate environment which I do not control, but everything has been working great...until today.
Today, every part of the system was down. I…
JasCav
- 233
- 1
- 12
4
votes
1 answer
Can't connect to HDFS in pseudo-distributed mode
I followed the instructions here for installing hadoop in pseudo-distributed mode.
However, I'm having trouble connecting to HDFS.
When I execute this command :
./hadoop fs -ls /
I get a directory listing just like I should.
However, when I execute…
sangfroid
- 193
- 1
- 3
- 10
3
votes
0 answers
How can I launch hdfs on Mesos without DC/OS?
From my understand DC/OS is a freemium managed service. Because I'd rather just have a raw Mesos implementation, I'd rather not be dependent on DC/OS and so I just want to know how to implement HDFS on Mesos without it.
Unfortunately google is…
Dr.Knowitall
- 209
- 1
- 10
3
votes
1 answer
Linux Network tuning to prevent tcp rcvpruned and backlogdrop?
My datanodes in my hbase cluster are triggering some tcp rcvpruned and backlog drops from time to time:
It seems to be there are at least two angles to approach this at:
Tune HBase/HDFS etc... so that these are not triggered
Tune the Linux network…
Kyle Brandt
- 81,077
- 70
- 299
- 442
3
votes
2 answers
Disable The Under Replicated Blocks Alert in Cloudera Manager
I have a single server Hbase cluster that I am only using as the sink end of HBase replication. Therefore I don't want to replicate any blocks within this cluster (since the source has replicated blocks I don't feel I need it).
I would like to…
Kyle Brandt
- 81,077
- 70
- 299
- 442